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Preface 


A half-century ago, advanced calculus was a well-defined subject at the core 
of the undergraduate mathematics curriulum. The classic texts of Taylor [19], 
Buck [1], Widder [21], and Kaplan [9], for example, show some of the ways it 
was approached. Over time, certain aspects of the course came to be seen as more 
significant—those seen as giving a rigorous foundation to calculus—and they be¬ 
came the basis for a new course, an introduction to real analysis, that eventually 
supplanted advanced calculus in the core. 

Advanced calculus did not, in the process, become less important, but its role in 
the curriculum changed. In fact, a bifurcation occurred. In one direction we got cal¬ 
culus on //-manifolds, a course beyond the practical reach of many undergraduates; 
in the other, we got calculus in two and three dimensions but still with the theorems 
of Stokes and Gauss as the goal. 

The latter course is intended for everyone who has had a year-long introduction 
to calculus; it often has a name like Calculus III. In my experience, though, it does 
not manage to accomplish what the old advanced calculus course did. Multivariable 
calculus naturally splits into three parts: (1) several functions of one variable, (2) one 
function of several variables, and (3) several functions of several variables. The first 
two are well-developed in Calculus III, but the third is really too large and varied 
to be treated satisfactorily in the time remaining at the end of a semester. To put it 
another way: Green’s theorem fits comfortably; Stokes’ and Gauss’ do not. 

I believe the common view is that any such limitations of Calculus III are at 
worst only temporary because a student will eventually progress to the study of 
general /i-forms on //-manifolds, the proper modern setting for advanced calculus. 
But in the last half-century, undergraduate mathematics has changed in many ways, 
not just in the flowering of rigor and abstraction. Linear algebra has been brought 
forward in the curriculum, and with it an introduction to important multivariable 
functions. Differential equations now have a larger role in the first calculus course, 
too; students get to see something of their power and necessity. The computer vastly 
expands the possibilities for computation and visualization. 

The premise of this book is that these changes create the opportunity for a new 
geometric and visual approach to advanced calculus. 
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More than forty years ago—and long before the curriculum had evolved to its 
present state—Andrew Gleason outlined a modern geometric approach in a series 
of lectures, “The Geometric Content of Advanced Calculus” [8], (In a companion 
piece [17], Norman Steenrod made a similar assessment of the earlier courses in 
the calculus sequence.) Because undergraduate analysis bifurcated around the same 
time, Gleason’s insights have not been implemented to the extent that they might 
have been; nevertheless, they fit naturally into the approach I take in this book. 

Let me try to describe this geometric viewpoint and to indicate how it hangs 
upon recent curricular and technological developments. Geometry has always been 
bound up with the teaching of calculus, of course. Everyone associates the derivative 
of a function with the slope of its graph. But when the function becomes a map 
f : R" —> M. p with n,p> 2, we must ask: Where is the graph? What is its slope at a 
point? Even in the simplest case n = p = 2, the graph (a two-dimensional surface) 
lies in R 4 and thus cannot be visualized directly. Nevertheless, we can get a picture if 
we turn our attention from the graph to the image, because the image of f lies in the 
R 2 target. Computer algebra systems now make such pictures a practical possibility. 
For example, the Mathematica command ParametricPlot produces a nonlinear 
grid that is the image under a given map of a uniform coordinate grid from its source. 
We can train ourselves to learn as much about a map from its image grid as we learn 
about a function from its graph. 

How do we picture the derivative in this setting? When we dealt with graphs, 
the derivative of a nonlinear function / at the point a was the linear function whose 
graph was tangent to the graph of / at a. Tangency implies that, under progressive 
magnification at the point (a,f(a)), the two graphs look more and more alike. At 
some stage the nonlinear function becomes indistinguishable from the linear one. 
There are two subtly different concepts at play here, depending on what we mean 
by “indistinguishable.” One is local linearity (or differentiability)'. f(a + Ax) —f(a) 
and f(a)Ax are indistinguishable in the technical sense that their difference van¬ 
ishes to greater than first order in Ax. The other is looking linear locally, the graphs 
themselves are indistinguishable under sufficient magnification. For our function /, 
there is no difference: / is locally linear precisely where it looks linear locally. 

There is a real and important difference, though, when we replace graphs by 
image grids, as we must do to visualize a map f: R 2 —> R 2 and its derivative df a . 
We say f is locally linear (or differentiable) at a if f(a + Ax) — f(a) and df a (Ax) are 
indistinguishable in the sense that their difference vanishes to greater than first order 
in 11 Ax 11. By contrast, we say f looks linear locally at a if the image grid of f near a 
is indistinguishable from the image grid of df a under sufficient magnification. To 
make the difference clear, consider the quadratic map q and its derivative at a point 
a = (a,b): 



Because the derivative exists everywhere, q is locally linear everywhere. Moreover, 
q also looks like its derivative under sufficient magnification as long as a f 0. But 
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at the origin, q doubles angles and squares distances, and continues to do so at any 
magnification. No linear map does this. Thus in no open neighborhood of the origin 
does q look like any linear map, and certainly not its derivative, which is the zero 
map. (There is no contradiction, of course, because the difference between q and its 
derivative vanishes to second order at the origin.) 

Quite generally, a locally linear map f: R" —> R" need not look linear at a point; 
however (as our example suggests), if the derivative is invertible at that point, the 
map will look linear there. In fact, this is the essential geometric content of the in¬ 
verse function theorem. Here is why. By hypothesis, a linear coordinate change will 
transform the derivative into the identity map. The local inverse for f that is provided 
by the theorem can be viewed as another coordinate change, one that transforms f 
itself into the identity map, at least locally. Thus f must look like its derivative lo¬ 
cally because a suitable (composite) coordinate change will transform one into the 
other. This leads us, in effect, to gather maps into geometric equivalence classes: 
two maps are equivalent if a coordinate change transforms one into another. In other 
words, a class consists of different coordinate descriptions of the same geometric 
action. The invertible maps together make up a single class. (Geometrically, there is 
only one invertible map!) 

For parametrized surfaces f: R 2 —> R 3 , or more generally for maps in which the 
source and target have different dimensions, invertibility of the derivative is out of 
the question. The appropriate notion here is maximal rank. Then, at a point where 
the derivative has maximal rank, the implicit function theorem implies that the map 
and its derivative once again look alike in a neighborhood of that point. Coordinate 
changes convert both into the standard form of either a linear injection or a linear 
projection. For each pair of source-target dimensions, maps whose derivatives have 
maximal rank at a point make up a single local geometric class. 

A nonlinear map can certainly have other local geometric forms; for example, 
a plane map can fold the plane on itself or it can wrap it doubly on itself (like q, 
above). The inverse and implicit function theorems imply that all such local geomet¬ 
ric forms must therefore occur at points where the derivative fails to have maximal 
rank. Such points are said to be singular. The analysis of the singularities of a dif¬ 
ferentiable map is an active area of current research that was initiated by Hassler 
Whitney half a century ago [20] and guided to a mature form by Rene Thom in the 
following decades. Although this book is not about map singularities, its geometric 
approach reflects the way singularities are analyzed. There are further connections. 
In 1975,1 wrote a survey article on singularities of plane maps [2]; one of my aims 
here is to provide more detailed background for that article. 

We do analyze singularities in one familiar setting: a real-valued function /. The 
target dimension is now 1, so only the zero derivative fails to have maximal rank. 
This happens precisely at a critical point, where all the linear terms in the Taylor 
expansion of / vanish. So we turn to the quadratic terms, that is, to the quadratic 
fonn Q defined by the Hessian matrix of the function at that point. Taylor’s theorem 
assures us that the Hessian form approximates / near the critical point (up to terms 
that vanish to third order). We ask: does / also look like its Hessian form near that 
point? 
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Some condition is needed; for example, f(x,y) = x 2 —y 4 does not look like its 
quadratic part Q(x,y ) = x 2 near the origin. Morse’s lemma provides the condition: 
/ does look like Q near a critical point if the Hessian matrix has maximal rank. 
That is to say, a local coordinate change in a neighborhood of the critical point will 
transform the original function into its Hessian form, in effect, removing all higher- 
order terms in the Taylor expansion of /. 

A nondegenerate Hessian therefore has an invariant geometric meaning, but only 
at a critical point. At a noncritical point, even concavity, for example, fails to be 
preserved under all coordinate changes. More generally, if linear terms are present 
and “robust” in the Taylor expansion of / at a point (i.e., they define a linear map 
that has maximal rank), quadratic and higher terms have no invariant geometric 
meaning. This is the implicit function theorem speaking once again. 

By asking whether a map looks like the beginning of its Taylor series, we are 
led to see the underlying geometric character of the inverse and implicit function 
theorems and Morse’s lemma. The question thus provides a way to organize and 
unify much of our subject and, in so doing, to bring out its simple beauty. 

Let me now describe the geometric approach this book takes to another of its central 
themes: the change of variables formula for integrals. 

To fix ideas, suppose we have a double integral, so the change of variables is 
an invertible map of (a portion of) the plane. Locally, that map looks linear. Each 
linear map has a characteristic factor by which it magnifies areas. To a nonlinear 
map we can therefore assign a local area magnification factor at each point, the 
area magnification factor of its local linear approximation at that point. This is the 
Jacobian. 

In the simplest case, the integrand is identically equal to 1, and the value of the 
integral is just the area of the domain of integration. A change of variables maps 
that domain to a new one with, in general, a different area. If the map is linear, and 
has area multiplication factor M, the new area is just M times the old (or the integral 
of the constant M over the old domain). However, if the map is nonlinear, then we 
need to proceed in steps. First subdivide the old domain into small regions on each 
of which the local area magnification factor M (the Jacobian) is essentially constant. 
The area of the image of one small region is then approximately the product of its 
own area and the local value of M, and the area of the entire image is approximately 
the sum of those individual products. To get better approximations, make finer and 
finer subdivisions; in the limit, we have the area of the new region as the integral of 
the local area multiplication function M over the original domain. For an arbitrary 
integrand, transform the integral the same way: multiply the integrand by M. All of 
this is easily generalized from two to n variables; areas become ^-volumes. 

A typical proof of the change of variables formula proceeds one dimension at a 
time; this tends to submerge the geometric force and meaning of the Jacobian M. By 
contrast, my proof in Chapter 9 follows the geometric argument above. I found it in 
an article by Jack Schwartz ([16]), who remarks that his proof appears to be new; he 

could not find a similar argument in any of the standard calculus texts of the time. 

* * * 
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One way I have chosen to stress the geometric is by concentrating on what happens 
in two and three dimensions, where we can construct—with the help of a computer 
algebra system as needed—illustrations that help us “see” theorems. And this is not 
a bad thing: the words theorem and theatre stem from the same Greek root 6eoc, 
“the act of seeing.” In a literal sense, a theorem is “that which is seen.” But the eye, 
and the mind’s eye not less, can play tricks. To be certain a theorem is true, we know 
we must test what we see. Here is where proof comes in: to prove means “to test.” 
The cognate form to probe makes this more evident; probate tests the validity of a 
will. Ordinary language supports this meaning, too: yeast is “proofed” before it is 
used to leaven bread dough, “the proof of the pudding is in the eating,” and “the 
exception proves the rule” because it tests how widely the rule applies. 

In much of mathematical exposition, proving is given more weight than seeing. 
Jean Dieudonne’s seminal Foundations of Modern Analysis [4] is a good example. In 
the preface he argues for the “necessity of a strict adherence to axiomatic methods, 
with no appeal to ‘geometric intuition’, at least in the formal proofs: a necessity 
which we have emphasized by deliberately abstaining from introducing any diagram 
in the book.” As prevalent as it is, the axiomatic tradition is not the only one. Rene 
Thom, a contemporary of Dieudonne and Bourbaki, followed a distinctly different 
geometric tradition in framing the study of map singularities, a study whose outlines 
have guided the development of this book. Although proof may be given a different 
weight in the geometric tradition, it still has a crucial role. I believe that a student 
who sees a theorem more fully has all the more reason to test its validity. 

But there is, of course, usually no reason to restrict the proofs themselves to 
low dimensions. For example, my proof of the inverse function theorem (Chapter 5, 
p. 169ff.) is for maps on R”. It elaborates upon Serge Lang’s proof for maps on 
infinite-dimensional Banach spaces [10, 11]. Incidentally, Lang points out that, in 
finite dimensions, the inverse function theorem is often proven using the implicit 
function theorem, but that does not work in infinite dimensions. Lang gives the 
proofs the other way around, and I do the same. Furthermore, because there is so 
much instructive geometry associated with implicit functions, I provide not just a 
general proof but a sequence of more gradually complicated ones (Chapter 6) that 
fold in the growing geometric complexity that additional variables entail. I think 
the student benefits from seeing all this put together. Other important examples of 
w-dimensional proofs of theorems that are visualized primarily in R 2 are Taylor’s 
theorem (Chapter 3), the chain rule (Chapter 4), and Morse’s lemma (Chapter 7). 
The definition of the derivative gets the same kind of treatment as the proof of 
the implicit function theorem, and for the same reason. Unlike the other topics, 
integral proofs are mainly restricted to two dimensions. One reason is that the many 
technical details about Jordan content are easiest to see there. Another reason is that 
the extension to higher dimensions is straightforward and can be carried out by the 
student. 

At a couple of points in the text, I provide brief Mathematica commands that 
generate certain 3D images. Because programs like Mathematica are always being 
updated (and the Mathematica 5 code I have used in the text has already been su¬ 
perceded), details are bound to change. My aim has simply been to indicate how 
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easy it is to generate useful images. I have also included a simple basic program 
that calculates a Riemann sum for a particular double integral. Again, it is not my 
aim to advocate for a particular computational tool. Nevertheless, I do think it is 
important for students to see that programs do have a role—integrals arise out of 
computations—and that even a simple program can increase our power to estimate 
the value of an integral. 

To help keep the focus on geometry, I have excluded proofs of nearly all the 
theorems that are associated with introductory real analysis (e.g., those concerning 
uniform continuity, convergence of sequences of functions, or equality of mixed par¬ 
tial derivatives). I consider real analysis to be a different course, one that is treated 
thoroughly and well in a variety of texts at different levels, including the classics of 
Rudin [15] and Protter and Morrey [14]. To be sure, I am recalibrating the balance 
here between that which is seen and that which is tested. 

This book does not attempt to be an exhaustive treatment of advanced calculus. Even 
so, it has plenty of material for a year-long course, and it can be used for a variety 
of semester courses. (As I was writing, it occurred to me that a course is like a walk 
in the woods—a personal excursion—but a text must be like a map of the whole 
woodland, so that others can take walks of their own choosing.) My own course 
goes through the basics in Chapters 2-4 and then draws mainly on Chapters 9-11. 
A rather different one could go from the basics to inverse and implicit functions 
(Chapters 5 and 6), in preparation for a study of differentiable manifolds. The pace 
of the book, with its numerous visual examples to introduce new ideas and topics, 
is particularly suited for independent study. From start to finish, illustrations carry 
the same weight as text and the two are thoroughly interwoven. The eye has an 
important role to play. 

In addition to the CUPM Proceedings [12] that contain the lectures of Gleason 
and Steenrod, 1 have been strongly influenced by the content and tone of the beauti¬ 
ful three-volume Introduction to Calculus and Analysis [3] by Richard Courant and 
Fritz John. In particular, I took their approach to integration via Jordan content. At 
a different level of detail, I adopted their phrase order of vanishing as a replace¬ 
ment for the less apt order of magnitude for vanishing quantities. For the theorems 
connecting Riemann and Darboux integrals in Chapter 8, I relied on Protter and 
Morrey [14]; my own contribution was a number of figures to illustrate their proofs. 
It was Gleason who argued that the Morse lemma has a place in the undergraduate 
advanced calculus course. I was fully persuaded after my student Stephanie Jakus 
(Smith ’05) wrote her senior honors thesis on the subject. 

The Feynman Lectures on Physics [6] have had a pervasive influence on this 
book. First of all, Feynman’s vision of his subject, and his flair for explanation, is 
awe-inspiring. I felt I could find no better introduction to surface integrals than the 
context of fluid flux. Because physics works with two-dimensional surfaces in K 3 , 
1 also felt justified in concentrating my treatment of surface integrals on this case. 
1 believe students will have learned all they need in order to deal with the integral 
of a A'-form over a A'-dimensional parametrized surface patch in R", for arbitrary 
k < n. In providing a physical basis for the curl, the Lectures prodded me to try to 


understand it geometrically. The result is a discussion of the curl (in Chapter 11) 
that—like the discussion of the Morse lemma—has not previously appeared in an 
advanced calculus text, as far as I am aware. 

I thank my students over the last decade for their curiosity, their perseverance, 
their interest in the subject, and their support. 1 especially thank Anne Watson 
(Smith ’09), who worked with me to produce and check exercises. My editor at 
Springer, Kaitlin Leach, makes the rough places smooth; I am most fortunate to 
have worked with her. I am grateful to Smith College for its generous sabbatical 
policy; I wrote much of the book while on sabbatical during the 2005-2006 aca¬ 
demic year. My deepest debt is to my teacher, Linus Richard Foy, who stimulated 
my interest in both mathematics and teaching. In his advanced calculus course, I 
often caught myself trying to follow him along two tracks simultaneously: what he 
was saying, and how he was saying it. 


Amherst, MA 
June 2010 


James Callahan 
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Chapter 1 

Starting Points 


Abstract Our goal in this book is to understand and work with integrals of functions 
of several variables. As we show, the integrals we already know from the introduc¬ 
tory calculus courses give us a basis for the understanding we need. The key idea 
for our future work is change of variables. In this chapter, we review how we use 
a change of variables to compute many one-variable integrals as well as path inte¬ 
grals and certain double integrals that can be evaluated by making a change from 
Cartesian to polar coordinates. 


1.1 Substitution 


There are two kinds of integral substitutions. As an example of the first kind, con¬ 
sider the familiar integral 

r b dx 

Jo r 


v~ * 


We know that the substitution x = tan .v is helpful here because 1 +x 2 = 1 T tan 2 .v = 
sec 2 s and dx = sec 2 s ds. Therefore, 


f dx 

f sec 2 sds f 

J 1 +x 2 J 

1 + tan 2 5 J 


= ds = s = arctanx, 


and we then have 


[ b - 
Jo 1 


dx 


+ x z 


= arctanx 


= arctanb. 


o 


As an example of the second kind of substitution, take the apparently similar 
integral 

f h xdx 


l 


o (1+x 2 )t’ 


P* I- 


Two kinds of 
substitutions 
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Integral as 
antiderivative 


Pullback 


Push-forward 
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1 Starting Points 


The factor x in the numerator suggests the substitution u= 1 +x 2 . Then du = 2xdx 
and 


/ 


xdx 

(1 +x 2 )P 


1 I" du 1 u p+x 

2 J uP 2 (— p+ 1) 


-1 

2 (p— 1)(1 +x 2 )P~ x ' 


Thus, 



xdx 

(1 +x 2 )P 


1 

2(P-1) 


(1 +b 2 )P- x 


In these examples, integration is done with the fundamental theorem of calculus. 
That is, we use the fact that the indefinite integral of a given function /, 

F = J f{x)dx , 

is an antiderivative of /: F'(x) = f(x). However, the substitutions we used to find 
the two antiderivatives are different in important ways. 

We call the first an example of a pullback substitution, for reasons that become 
clear in a moment. In a pullback, we express the variable x itself as some differen¬ 
tiable function x = < p(s) of a new variable s. Then dx = (p'(s) ds and we get 

J f(x)dx = J f((p{s))(p'(s)ds = €>, 

where @(s) is an antiderivative of f((p(s))(p'(s). Here the aim is to choose the 
function <p so the antiderivative <J> becomes evident. The indefinite integral we want 
is then F(x) = cE>(q> 1 (x)), where s = <p ~ 1 (x) is the inverse of the function x = <p (s ). 
(In our example, (p is the tangent function and (p 1 is the arctangent function; &(s) 
is just s.) We also use <p _1 to get the upper and lower endpoints of the definite 
integral; 

rb r<t>~\b) 

/ /(x) dx= f(<p(s))<p (s)ds. 

Ja J (p l (a) 

In the second example, a push-forward substitution, we replace some functional 
expression g(x) involving x with a new variable u. As with (p(s), it takes practice 
and experience to make an effective choice of g(x): the aim is to be able to write 

f(x) = G'(g(x )) -g'(x) or f(x)dx = G'(u)du 

for a suitable function G'(u). That is, du = g'(x) dx and 

J f(x)dx = J G'(g(x))g (x) dx = j G'(u)du = G, 
and the antiderivative is F(x) = G(g(x)). In our example, 
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Note that we use g ( and not g ') to get the endpoints of the transformed definite 
integral: 

rb rg(b) 

/ f(x)dx = / G(u)du. 

Ja Jg(a) 

To see how the substitutions using (p and g are different, and also to see how they 
got their names, let us think of them as maps: 

<P g 

s -> x -> u 

Then we can say g pushes forward information about the value of x to the variable u, 
and (p pulls back that information to s. Note that the pullback needs to be invertible: 
without a well-defined <p -1 , a given value of x may pull back to two or more different 
values of 5 or to none at all. This problem does not arise with g, though. 


To complete this section, let us review why the differential changes the way it 
does in a substitution. For example, in the pullback x = (p(s ), why is dx = (p'(s)dsl 
The answer might seem obvious: because dx/ds is just another notation for the 
derivative—that is, dx/ds = (p'(s) —we simply multiply by ds to get dx = (p'(s)ds. 
This is a good mnemonic; however, it is not an explanation, because the expressions 
dx and ds have no independent meaning, at least as far as derivatives are concerned. 
We must look more carefully at the link between differentials and derivatives. 

In a linear function, x = <p(s) = ms+ b, we usually interpret the coefficient m as 
the slope of the graph: Ax/A s = m. However, if we rewrite the slope equation in the 
form Ax = m As, it becomes natural to interpret m instead as a multiplier. That is, our 
linear map (p : s i—> x multiplies lengths by the factor m: an interval of length A,v on 
the s-axis is mapped to an interval of length Ax = m As on the x-axis. Furthermore, 
when m < 0, As and Ax have opposite orientations, so (p also carries out a “flip.” 
(The role of the coefficient as a multiplier rather than as a slope suggests why it is 
so commonly represented by the letter “w”.) 

When x = (p (s) is nonlinear, the slope of the graph (or the slope of its tangent line) 
varies from point to point. Nevertheless, by fixing our attention on a small neigh¬ 
borhood of a particular point s = so, we still have a way to interpret the derivative as 
a multiplier. To see how this happens, recall first that we assume (p is differentiable, 
so 


<p'(s o) 


lim — = lim 

As—>0 As s— >SQ 


<p(s)-(p(so) 

s-s 0 


According to the meaning of a limit, we can make Ax/As as close to <p'(so) as 
we wish by making As = s — sq sufficiently small; in other words. 


Ax Ki cp'(so) As when As « 0. 


To see what this means, focus a microscope at the point (so,xo) and use coordinates 
As = s — so and Ax = x — xo centered in this window. Then, under sufficient magni¬ 
fication (i.e., with As « 0), (p looks like Ax ~ <p'(so) As. We call this the microscope 


Why pull back and 
push forward ? 


Transformation 
of differentials 


Slope as 
length multiplier 



As s 
Ax = m As 


The microscope 
equation and linear 
approximations 







4 


1 Starting Points 


equation for x = (p(s) at sq; it is linear, and defines the linear approximation of 
the function <p at sq. 


<p' is the local 
length multiplier 


Integral as a limit 
of Riemann sums 



Ax ~ (p'(s 0 ) As 


Finally, we can say that cp is locally linear, in the sense that x = <p(s) comes as 
close as we wish to its linear approximation Ax « <p'(so)As when 5 is restricted to 
a sufficiently small neighborhood of so- Thus, because the map cp : s —> x is locally 
linear at so, it multiplies lengths (approximately) by <p'(so) in any sufficiently small 
neighborhood of s = so- 

With the microscope equation, we can now see why the differential transforms 
the way it does when we make a change of variables in an integral. First of all, a 
definite integral is defined as a limit of Riemann sums. In the simplest case—a left- 
endpoint Riemann sum with n equal subintervals—we can set Ax = (b — a) jn and 
Xi = a + (i — 1) Ax and write 



lim X/(x,0Ax. 


We think of each term in the sum as the area of a rectangle with height /(*,•) and 
base Ax, as in the figure at the left, below. 



The pullback creates The figure at the right shows how the substitution x = (p{s) pulls back our par- 

a new Riemann sum tition of the interval a < x < b to a partition of (p~ l (a ) < s < (p~ l (b). We set 
Si = (p _1 (x/) (i = 1,... ,n+ 1) and As, =s i+ i — s, (i = 1,... ,n). Note that the subin¬ 
tervals A Si are generally unequal when <p is nonlinear. In fact, As, s=s Ax/<p'(j,-), by 
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the microscope equation. The pullback allows us to write 

£/(x/)Ax« X/(<P(s/))<p'(s/)M-- 

i=i /= l 

By choosing n sufficiently large, we can make every As, arbitrarily small and thus 
can make these two sums arbitrarily close. Notice that the right-hand side is also a 
Riemann sum, in this case for the function /(<p(s)) (p'(s). Therefore, in the limit as 
n —> °°, the Riemann sums become integrals and we have the equality 

rb r<p~ l (b) 

/ f(x)dx = f(<p(s))<p (s)ds. 

Ja J (p l (a) 

Thus we see that the justification for the transformation dx = <p' ( s ) ds of differentials 
in integration lies in the transformation Ax « <p' (A;) As, that the microscope equation 
provides for the Riemann sums. 

The microscope equation Ax « <p' ($,-) As, has one further geometric consequence. 
In our Riemann sum for the second integral, the standard way to think about each 
term is as the area of a rectangle with height f(<p{si)) (p'(si) and base As,-. However, 
if we change the proportions and make the height f(<p(sj )) and the base <p'(s,-) As,-, 
then we have a rectangle that matches (as closely as we wish) the shape of the 
original rectangle with height /(x,-) and base Ax, because /(x,-) = /(<p(s/)) and Ax rts 
(p'(sj)Asj. 




y 


y 





/(<7>0h)) 


f(x,) 


ASj s (p'(Sj ) A Sf s Ax x 

To understand why differentials transform the way they do, we worked with a 
pullback substitution. We get the same result with a push-forward, though the de¬ 
tails are different. Our work has led us to several questions that we ask again when 
we turn to more general integrals that involve functions of several variables: what 
different kinds of substitutions occur? What role do inverses play? What is the form 
of a linear approximation? What is the analogue of the local length multiplier? What 
are differentials and how do they transform? What is the geometric interpretation of 
that transformation? 


dx = ( p'(s)ds 


Rectangles in the 
Riemann sums 


Some questions raised 
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1 Starting Points 



1.2 Work and path integrals 

Path integrals are one of the centerpieces of the first multivariable calculus course, 
and they are often treated, as we do here, in the context of work. 

Force, displacement, 
and work 

By definition, a force moving a body from one place to another produces work, 
and the work done is proportional to both the force applied and to the displacement 
caused. The simplest formula that captures this idea is 

tin 

work = force x displacement. 

m 

Although work is a scalar quantity, force and displacement are actually both vec¬ 
tors, and the force is a field, that is, a variable function of position. We must elaborate 
our simple formula to reflect these facts. Consider a straight-line displacement along 
some vector Ax and a constant force field F that acts the same way at every point 

F, 

along Ax. Only the component of the force that lies in the direction of the displace¬ 
ment does any work; this is the effective force F e ff. We can take all this into account 
in the new formula 

'■ n Ax 

work = || F e ff IIII Ax||. 

F eff 

The scalar F e g- is the length of the perpendicular projection of F on Ax. Now, in 
general, for arbitrary vectors A and B f 0, 

A B 

length of projection of A onto B = . 

II B !I 

work = F • Ax 

Rewriting the length F e g this way, we see work is still a product; it is the dot (or 
scalar) product of force F and displacement Ax, now regarded as vectors : 

F • Ax 

work = W= Ax = F • Ax. 

1 Ax| 

Work is additive 

M 

In our new formula, W can take negative values (e.g., if F makes an obtuse angle 
with Ax). To see why “negative work” must arise, consider a constant force F that 
displaces an object along a path consisting of two straight segments Axi and Ax 2 , 
one immediately followed by the other. We want the total work to be the sum of the 


work done on the separate segments: 

X/ 

AxV 

total work = F • Axi + F • Ax 2 . 

Orientation matters 

We say that work is additive on displacements. In particular, if Ax 2 = — Axi, then 
the total work done is 0. Consequently, the work done by F along —Ax must be the 
negative of the work done by the same F along +Ax. Orientation matters: reversing 
the displacement reverses the sign of the work done. 

Components of work: 

W = PAx+QAy 

Let us introduce coordinates into the plane containing the vectors F and Ax and 
write F = (P, Q) and Ax = (Ax, Ay). Then 
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W = F ■ Ax = P Ax + QAy. 

This formula gives the coordinate components of work. It says that, in the 
x-direction, there is a force of size P acting along a displacement of size Ax, doing 
work W x = PAx. Similarly, in the y-direction the work done is W y = QAy. We call 
W x and W y the components ofW in the x- andy-directions. The following definition 
summarizes our observations to this point. 

Definition 1.1 The work done by the constant force F = (P,Q) in displacing an 
object along the line segment Ax = (Ax, Ay) is 


W = F-Ax=PAx+QAy=W x +W y . 


Ultimately, we need to deal with variable forces and displacements along curved 
paths. The prototype is a smooth simple curve C in the plane. We say C is smooth 
if it is the image of a map (an example of a vector-valued function) 

x : [a,b] —> R 2 : 1 1 —> (x(t),y(t)) 

(a parametrization) whose coordinate functions x(t) and y(t) have continuous 
derivatives on a < t < b. We call t the parameter. In addition, C is simple if it 
has no self-intersections, that is, if x is 1—1. The parametrization orders the points 
on C in the following sense: x(ti) precedes x(tf) if t\ < tj (i.e., t\ precedes t 2 in 
[a, b]). The ordering gives C an orientation; we write C to indicate C is oriented. At 
any point on C where the tangent vector x'(t) is nonzero, it points in the direction 
of increasing t, and thus also indicates the orientation of C. We can immediately 
extend these ideas to paths in R". 

Definition 1.2 A smooth, simple, oriented curve C in R" is the image of a smooth 
1—1 map, 

x : [a,b\ —> R" : 11 —»x(t), 

where x!(t) f 0 for all a < t < b. The point x(a) is the start of C and x(b) is its end. 

The simple formula W = F • Ax for work assumes that the force F is constant, 
so the location of the base point a of the displacement Ax is irrelevant. However, 
if F varies, then the work done will depend on a. We must, in fact, treat a linear 
displacement as we would any displacement, and provide it with a parametrization. 
A natural one is 

x{t) = a + f- Ax, 0<f<l. 

We are now in a position to estimate the work done by a variable force as it 
displaces an object along a smooth, simple, oriented curve C in R 3 . Force is now a 
(continuous) vector field—that is, a vector-valued function F(x) that varies (contin¬ 
uously) with position x. To estimate the work done, chop the curve into small pieces. 
When a piece is small enough, it is essentially straight and the force is essentially 
constant along it. On this piece, the linear formula for work (Definition 1.1) gives a 
good approximation. By additivity, the sum of these contributions will approximate 


Displacement along 
a curved path 



Parametrizing a 
smooth simple curve 


Linear displacements 
as oriented curves 


1 t 



Work done by 
a variable force 


1 Starting Points 


Partition the curve 


Approximate the work 
along each segment 


Smooth path integral 


More general paths 


the total work done along the whole curve. To get a better estimate, chop the curve 
into even smaller pieces. 

In more detail, let xi, X 2 , ..., x* + i be an ordered sequence of points on C, with 
xi at the start of C and x* + \ at the end. We say {x,} is a partition that respects the 
orientation of C. Let the oriented curve C, be the portion of C from x, to x !+ i, and 
let Wi be the work done by F along Q; then, by the additivity of work, 

k 

total work done by F = ^ Wi. 

i=l 



Let Ax, = x, + i — x, be the linear displacement with base point x,-. When || Ax, || is suf¬ 
ficiently small, Ax, will be as close to the curved segment C, as we wish, because C, 
is smooth. Moreover, F will be nearly constant along Ax,, because F is continuous. 
In particular, F(x) will differ by an arbitrarily small amount from its value F(x, ) at 
the base point of Ax,. Therefore, Wi « F(x,) - Ax,. If we choose k large enough and 
make each || Ax, || sufficiently small, then the sum 

k 

X F(x,') ■ Ax; 

i= 1 

will approximate the total work as closely as we wish. In fact, this last expression 
is a Riemann sum for a new kind of integral, called a path , or line, integral that 
we now define quite generally for smooth, simple, oriented paths in any dimension. 
Note that the definition does not depend on the parametrization of the path. 

Definition 1.3 (Smooth path integral) The integral of the continuous vector¬ 
valued function F(x) over the smooth, simple, oriented curve C in M" is 

f k 

/ F ■ dx = lim V F(x,) • Ax,-, 

Jc *->“ 

mesh^t 0 l ~ 1 

if the limit exists when taken over all ordered partitions x\, X2, ..., of C with 
mesh = max,-1| Ax,|| and Ax,- = x, + i — x,-, i = 1 ,...,k. 

We can now define a more general collection of integration paths. If we allow the 
start and end of C to coincide (the tangent directions need not agree) and there are 
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no other self-intersections, we say that C is a simple closed curve. The definition of 
the path integral of F over C is unchanged. A piecewise-smooth, oriented path C 
is the union of smooth oriented paths C \, C 2 , ■ ■ ., C m , each of which is either simple 
or a simple closed curve. We write C = C\ H-b C m and define 


/*■■*= / 

C Ci+-+C m 


F • dx = J F ■ dx H-f / F '^ 


Ci 


The combined path C may be neither simple, smooth, nor even connected. A special 
case arises when the pieces C, fit together, with the end of Cj coinciding with the 
start of Cj+ 1 , for every j = \.... ,m 1. Then C is a single connected curve that is 
smooth everywhere except possibly at the points where the pieces join. 

Because the work done by the force F in displacing an object along a smooth, 
simple, oriented path C, is the limit of the same sums that define the integral of F 
over Q, and because we want work to be additive over the sum C of such paths 
in K 3 , we make the following definition. 

Definition 1.4 The work done by the force F along the smooth oriented path C is 

W = j F ■ dx. 

How shall we compute the value of a path integral? Integrals—both ordinary 
integrals and path integrals—are limits of Riemann sums, but we rarely calculate 
them that way. To evaluate an ordinary integral, the common practice is to invoke 
the fundamental theorem of calculus to treat the integral as an antiderivative, and 
then use various techniques to find the antiderivative. To evaluate a path integral, 
our practice is to pull it back to an ordinary integral using the parametrization of 
the path. In the process, we demonstrate that the path integral exists; that is, the 
Riemann sums defining it have a limit. 

Let C be a smooth, simple, oriented curve, and suppose it is parametrized by x(f), 
a<t <b. Let F(x) be a continuous vector function defined on C. To decide whether 
the integral of F over the path C exists and has a finite value (Definition 1.3), we 
choose a partition xi, \ 2 , ..., x*+i that respects the orientation of C and form the 
Riemann sum 

k 

X F(xi) ■ a x,-, Ax,- = x,-+i - Xi 
1=1 

for the path integral. Because C is simple, the parametrization x(f) is 1—1, so there 
is a unique partition 

a = t\ < t 2 < ■ ■ ■ < 4+1 = b 

of [a, b\ with x, = x(f,-), i = 1, ..., k+ 1. The microscope equation implies that 


Ax, = x(4 + i) - x(f,-) « x'(ti )(f/ + i - ti) = x'(ti) At i 



Work is a path integral 


Computing a 
path integral 


Path integral to 
ordinary integral 


when Atj = t i+ \ — ti ss 0. Therefore, if every At,- « 0, 
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Pullback substitution 


Example 1 


X F(x;) • Ax,- « ^ F(x(t,-)) • x'(f,-) At,-. 


i— 1 


i= 1 


The expression on the right is a Riemann sum for the ordinary integral 

rb 

/ F(x(f)) ■x\t)dt. 

Ja 

Because F(x(t)) • x'(f) is continuous on a < t < b, this ordinary integral exists and 
is the limit of its Riemann sums. The Riemann sums for the path integral—the sums 
on the left—must likewise converge, and to the same limit. This establishes the 
following theorem. 

Theorem 1.1. Suppose C is a smooth, simple, oriented curve in R" that is parame¬ 
trized by x(t), a < t < b. IfF(x) is a continuous vector function defined on C, then 
the integral of F over the path C exists, and 


[ F ■ dx = [ F(x(t))-x'(t)dt. 
JC Ja 


□ 


Corollary 1.2 Suppose C = C] -h C m , where Cj is a smooth, simple, oriented 

cwve in R" parametrized by xj(t), aj <t <bj, j = 1,..., m; then 

r fb\ rb m 

F dx= F(xi(t)) • x\(t)dt H-h / F (x m (t))-x! m {t)dt. □ 

JC J a\ Jam 

Note that the conversion from path integral to ordinary integral is a pullback: by 
means of the map x : [afi] —> W\ the integrand F(x) on C is pulled back to F(x(t)) 
on \a,b\, and the differential dx is pulled back to x'(t)dt. 


[afi] C 

f F(x(t)) • x(t)dt = f F • dx 
Ja Jc 

The equality dx = x 1 dt of differentials comes from the microscope equation, Ax « 
x 1 At, just as it did in the pullback of ordinary integrals. 

Here is a simple example in R 2 to illustrate how the pullback substitution works 
to evaluate a path integral. We take F = (0,x); this represents a vertical force whose 
magnitude at any point is proportional to the distance from that point to the y-axis. 
We take C to be the right third of the circle x 2 -hy 2 = 4, oriented counterclockwise. 
The map 

x(t) = (2cos/,2sint), ~n/3<t<n/3, 
parametrizes C, provides the correct orientation, and gives 

F = F(x(t)) = (0,2cost), dx = xJ dt = (—2sint,2cost)<ah. 


1.2 Work and path integrals 

The work W done by F along C is 
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r rit /3 f 

/ F-dx = / (0,2cost) • (—2sinr,2cost)t/t = / 

JC J-n/3 J- 


n/3 

k/3 


4cos 2 tdt = ^n+V 3. 


The work done by the same force along the curve — C with the opposite orientation 
should be —2 n. To verify this independently, we can parametrize — C as 


x = (2sinf,2cost), 7 t/6 < t < 5n/6\ 


then 


F = (0,2sint), dx = (—2cost,— 2sint)dt, F • dx = —■4sin 2 1 dt, 


and 


work done along —C = / F ■ dx = / — 4sin 2 tdt = — %K— V3. 

J-c Jn/6 

A curve can be parametrized in more than one way. Will that change the value 
of W1 Notice that the (unoriented) curve C in our example is also the graph of 
the function x = \/4 — y 2 , — V3 < y < y/3. We can therefore use y itself as the 
parameter. The map 


r(y) = (^4, -Vs <y<V 3, 

parametrizes C with the proper orientation, and 

F = F(r(y)) = (o, V^?) , dr = r'(y)dy = (^==^ ^ dy. 

Using the new parametrization to provide the pullback, we find the work done is 

W = F • dr = J^^4~y 2 dy=V3 + Y- 

Thus W is unchanged. The exercises provide a third parametrization of C; you are 
asked to verify that it also gives W = VS + 4n/2>. 

The example prompts us to consider different parametrizations of the same 
smooth, simple, oriented curve C. One way to generate a new parametrization is 
by a smooth parameter change t = h(u): 

r(u) = x(h(u)), c<u<d. 

Here h : [c, d\ —> [a, b\ is continuously differentiable, 1-1, onto, and h'iu) > 0 for all 
c <u < d. Note that r has the same image as x. Furthermore, 
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r'(u) = x' (h(u)) h'(u); 


A parameter change 
for Example 1 


because h!(u) > 0, the tangent vectors r '(u) and x'(h(u)) point in the same direction 
for all c < u < d. Thus r induces the same orientation as x, and provides an alternate 
parametrization of C. 




The two parametrizations of the counterclockwise-oriented circular arc of ra¬ 
dius 2 (Example 1, above) are connected by a parameter change; it is t = h(y) = 
arcsin(y/2). To see that this is the formula for h, first note that 


(2cos/,2sin/) = x(f) = r(y) = ^\/4 — y 2 ,}^) . 

Equality of the y-components gives 2sin/ = y, so / = h(y) = arcsin(y/2). It re¬ 
mains to check that the x-components transform properly under the same parameter 
change. But because cos(arcsinw) = V1 — w 2 , we have 


x = 2 cos / = 


2 cos h(y) = 2cos(arcsin(y/2)) = 2 \J 1 — (y/2 ) 2 


y/4-y 2 , 


as required. According to Theorem 1.4, below, any two parametrizations of a smooth 
simple curve are connected by a smooth change of parameter. This is easiest to see 
after we have introduced the special arc-length parametrization (p. 15ff). 


Component notation 



We saw earlier that, in the simple case of a constant force acting along a straight 
line, we could break down the work into coordinate components. For example, in 
K 2 , if F = ( P 1 Q) and Ax = (Ax, Ay), then W has components P Ax and Qky. Let us 
see how we can do the same for a variable force or, more generally, for any path 
integral in R”. 

For simplicity, we first take F and C in the (x,y)-plane. Let x, = (x,-,^) be a fine 
partition of C into oriented segments Q, and let x i+ i — x,- = Ax, = (Ax,-,Ay/). Let 
P(x,y) and Q(x,y) be the components of F that act at the point x = (x,y): 

F (x,y) = (P(x,y),Q(x,y)). 

Then we can estimate the work done by F along the segment Q as 


AW = F(x,-) • Ax,- = P(xi,yi)Axi + Q{x u yi) Ay,-. 

The figure indicates that the work estimate splits into two parts: a horizontal force 
P{xi,yi) acting through the horizontal displacement Ax,-, and a vertical force Q(x,-,y,-) 
acting through a vertical displacement Ay, . Our estimate for the total work along C 
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X Alv < = X F ( x ')' AX; ' = a** + Q( x ‘-yi) A yi- 

i= 1 i'=l i=l 

These estimates converge to the path integral—and hence to the work done—as the 
mesh size of the partition tends to zero. We can regard each expression as a Riemann 
sum for the limiting integral; the three different ways of writing the Riemann sum 
lead us to three different ways to write the integral: 

W = [ dW = [ F dx= [ Pdx + Qdv. 

Jc Jc Jc 

On the right is the component form of the work integral. 

If we adopt the informal practice of regarding an integral as an infinite sum of 
“infinitesimal” terms, then the integrand in the work integral is the infinitesimal 
amount of work dW done along an infinitesimal segment dx = ( dx,dy): 

dW = F ■ dx= (P,Q) ■ (dx, dy ) = Pdx+ Qdy. 

From this point of view, the expressions Pdx and Qdy are the horizontal and vertical 
components of the infinitesimal work dW. 

Moving to R", we set x = (x\,...,x n ),dx = (dx \dx„), and take P, to be the 
/th component of F, so 

F(x) = (Px(xi,...,x n ),...,P n (xx,...,x n )), 
dW = F • dx = P\ dx i H- \-P„ dx n . 


Suppose x(t) = (<pi (?),... ,<pn(t)), a < t < b, parametrizes the oriented curve C; then 
dx= (cp((p' n (t))dt and 

W = J dW = jjF ■ dx = J p\ dx i H- \-P n dx n 

rb 

= / [ p l Ol (0. ■ • • 1 <P« (0) <Pl (0 + ■ ■ ■ +P n ((pl (Pn (0) <Pn{t)\dt. 

J a 

The final expression is an ordinary integral; it gives us a way to compute the path 
integral by means of the n pullback substitutions x\ = <pi (t),..., x n = (p„(t). 

As an example of a path integral in R 2 given in terms of its two components, let 
us determine 

r x 2 

/ xvdx-i - dv, 

Jc y 

where C is the piecewise-smooth path consisting of the horizontal segment C\ from 
(1,7) to (5,7) followed by the vertical segment C 2 from (5,7) to (5,2), traversed 
in that order. On C\, y = 7 and dy = 0; we can simply take x as the parameter. 
On C 2 , x = 5, dx = 0; we take y as the parameter and note that we must integrate 
“backwards” from >> = 7 to y = 2. The path integral equals 


“Infinitesimal” work 


Component form 
of a path integral 


Example 2 
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Arc length 



Computing arc length 


f 5 , f 2 5 2 , 7x 2 

5 

/ lx ax + / — ay = —— 

+ 25 lny 

h h y 2 

l 


84 + 25ln(f). 


We now define the length of a smooth, simple, oriented curve C in M". Let xi, 
X 2 , ..., x*+i be an ordered partition that respects the orientation of C. The vectors 
Ax, = x 1+ i — x,-, z = 1 are straight-line segments that make up what is called 
a polygonal approximation to C. If we let As, = ||Ax,j| denote the length of the z'th 
segment, then we can express the length of the whole polygonal approximation as 
the sum 

n 

L w, = X a ' s l 

i=l 

This expression is also a Riemann sum for a new kind of path integral. If the Rie- 
mann sums have a limit, so that the new path integral is defined, then that path 
integral can serve to define the length of C. 

Definition 1.5 The arc length of the smooth, simple, oriented curve C is 

M r- 

L= lim ^A Si = / ds, mesh = maxAs, = max||Ax,j|, 

mesh -A) i= 1 ^ 2 ' 

if this limit exists when taken over all ordered partitions {x,-} of C; we call ds the 

element of arc length for C. 

The definition is not a practical computational tool. However, we can compute 
arc length by following the same approach we took to compute the path integral for 
work: use the parametrization of the curve to pull back the integration to the real 
line. The justification is essentially the same as the one for Theorem 1.1, page 10. 
Let x(t), a<t <b parametrize C. Because C is simple, there is a unique partition 

a = t\ <t 2 < ■ ■ ■ < 4+1 = b 

of [a, b\ with x, = x(f,). If the partition is sufficiently fine, then the microscope equa¬ 
tion implies Ax,- ks x'(+) At,- (where At,- = t,-+i — t,-) as closely as we wish. Therefore 

= i As ' = i ll Ax 'H ~ i ll x, (^)llM; 

i=l (=1 i=l 

the last is a Riemann sum for the ordinary integral 

/V(O 0 *. 

J a 

Because ||x'(t) || is continuous, the Riemann sums converge to the integral. The num¬ 
bers must converge as well, giving the arc length of C and proving the follow¬ 
ing theorem. 
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Theorem 1.3. IfC is a smooth, simple, oriented curve parametrized as x(t), with 
a < t < b, then 


L 


ds = arc length of C 



□ 


Let Q denote the segment of C parametrized by x(v) with a < v < t. Then the 
arc length of C t is the function 


The arc-length 
parametrization 


s{t) = [ l|x'(v)||Jv. 

J a 

Note that 0 < s(t) < L = arc length of C. By the fundamental theorem of calculus, 
s' (t) = ||x'(t)||. Because ||x'(t)|| > 0 on a < t < b, the functions = s(t) is invertible 
on this interval. Let t = cr(s) be the inverse (extended to all of 0 < s < L by setting 
a = ff(0), b = o(L)). Then y(s) = x(c(s)) is a new parametrization of C, called the 
parametrization by arc length, or the arc-length parametrization. The variable 
s itself is called the arc-length parameter. Because s'(t) = ||x'(t)||, our mnemonic 
for differentials becomes ds = f{t)dt = ||x'(t)|| dt, supporting the equality (Theo¬ 
rem 1.3) 

J ds = J \\x'{t)\\dt. 



Let us determine the arc length and the arc-length parametrization of the curve 
C : x(f) = (t 3 /3,t 2 /2), 0 < t < 2. The arc-length function is the integral (i.e., an¬ 
tiderivative) of the speed of the parameter point as it moves along C. The velocity of 
the moving point is the vector x' = (f 2 , t) ; therefore its speed is ft 4 + t 2 = tsjt 2 + 1, 
and the arc-length function is 


s(t ) = / v\A 
Jo 


1 dv = 


(v 2 + l) 3 / 2 


(t 2 + l) 3 / 2 - 1 


The length of C is therefore s(2) = (5 3 / 2 — l)/3 « 3.39. To find the inverse t = o(s) 
of the arc-length function, we solve s = s(t) for t: 


(t 2 + l) 3 / 2 — 1 
3 

3s+ 1 = (t 2 + 1) 3/2 , 

(3s + l) 2 / 3 = f 2 + 1, 

V (3s + l) 2 / 3 — 1 = t. 


Example 3 



Hence the arc-length parametrization of C is 


y(s) = x(<t(s)) 


/((3s+ l) 2 / 3 — l) 3 / 2 (3s + l) 2 / 3 — 1 \ 

V 3 ’ 2 )■ 
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1 Starting Points 


Connecting two 
parametrizations 


To show how s measures arc length, let us locate the points that are s = 1,2, and 
3 units along C. We have: 


s 

t 

x = t 3 /3 

y = t 2 /2 

1 

yj 4 2 /^ — 1 = 1.23282 

0.625 

0.760 

2 

\J 7 2 / 3 - 1 = 1.63074 

1.446 

1.330 

3 

x/10 2 / 3 — 1 = 1.90829 

2.316 

1.821 


The three points are plotted on the graph below. Their spacing from the origin 
(where s = 0) along C appears to match the spacing of the unit segments along 
the x-axis. Compare this to the uneven spacing of the parameter points t = 0. 1, 
and 2. 



Theorem 1.4. Suppose x(f), a < t < b, and r (u), c < u < d, both parametrize the 
smooth, simple, oriented curve C; then there is a continuously differentiable param¬ 
eter change h : [c,d\ —> [a,b\for which r (u) = x{h(u)), c <u < d. 

Proof. Suppose y(s) is the arc-length parametrization of C, and 

H0= f l! x '( v )ll^ «(«)=/ ll r 'WII dw. 

J a J c 

These functions are continuously differentiable on their domains, and so is the in¬ 
verse t = a(s) of 5 = s(t). Therefore, 

t = h(u ) = a(s(u)) 

is continuously differentiable on c<u<d, and inasmuch as 

x(<7(s)) =y(s), r{u)=y{s(u)), 

we have \(h(u )) = \(o(s(u))) = y{s(u)) = r(u). □ 
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If we think of the parameter t as measuring time, then x'(f) gives the velocity of 
the point x(t) as it moves along C, and ||x , (f)|| gives its speed. For the arc-length 
parametrization y(s) = x(<r(s)), we therefore have 


y'O) = 


dy_ 

ds 


dx do 
dt ds 


x'(c m 


i 

ds/dt 


x/ (gQ0) 

||X'(CT(,))|| 


11 / 0)11 = !■ 


Unit-speed 

parametrization 


For this reason, y is also called the unit-speed parametrization of C. Furthermore, 
t(,v) = y'{s) is a unit tangent vector for C and points in the direction in which C is 
oriented (cf. p. 12). 

There is a simple way to reverse a path’s orientation with a parameter change. 
Suppose x(t), a<t <b parametrizes C. Let u = a + b — t; then, as t goes from a to 
b, u goes from b to a. Therefore, although the map 

r (u) = x(a + b — u), a<u<b, 

has the same image as x, it induces the opposite orientation on that image. We denote 
the oppositely-oriented path as — C. Note that the tangent vectors at corresponding 
points of — C and C point in opposite directions: 


r^w) = —x 1 (t) when t = a + b — u. 



It follows from Theorem 1.4 that any parametrization of — C can be obtained from 
x{t) by a parameter change t = h(u ) for which h’iu) < 0. 

The oriented paths — C and +C have distinct arc-length parameters: they run 
along their common path C in opposite directions. By contrast, the paths have the 
same arc length. Although this is evident from the definitions, it is worth deducing 
the result anew by calculating the integrals using parametrizations. Suppose x(t), 
a<t<b parametrizes +C, and r (u), c <u < d parametrizes —C. Then there is a 
parameter change t = h(u) with h' < 0, h(c) = b, h(d) = a. We have 


Reversing orientation 



Arc length of an 
unoriented path 


arc length of +C 


f h \\x\t)\\dt= [ C \\x'{h{u))\\h'(u)du 
J a J d 

[ ||x , (/z(m))|| \h' (u)\ du = f \\x'(h(ii)) h'(u)\\du, 

J C J c 


reversing the limits of integration by taking h'{u) = —\h'(u)\ into account. But be¬ 
cause r(w) = x(h(u)), the chain rule gives 


r '{u) = x'(h(u)) h'(u), 


and hence 
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1 Starting Points 


The path integral of 
a scalar function 


Mass of a wire 


Scalar path integral 


Example 4 


J \\x'(h(u)) h'(u)\\du = J 11r'(w)11 du = arc length of —C. 

Thus, we can assign to the underlying unoriented path C an arc length equal to the 
common arc lengths of +C and —C. 

Note that the integrand of every path integral we have considered—with the ex¬ 
ception of the integral for arc length—has been a vector function. With such an 
integrand, it is essential to pay attention to the relation between the direction of the 
vector and the direction (i.e., the orientation) of the integration path. With a scalar 
function, there is no similar concern and, as we have seen, arc length is meaningful 
for an unoriented path. We now define the integral of a general scalar function over 
an unoriented path, illustrating the ideas by using mass density. 

Consider a thin wire in the shape of a space curve C. Suppose that the mass 
density of the wire varies continuously and has the value p (x) grams per centimeter 
at the point x on C. To estimate the total mass of the wire, partition C by choosing 
points xi, ..., X£ + i that run from one end of C to the other. With a sufficiently 
fine partition, the length of the segment from x,- to x,- + ] is approximately As,- = 
||x i+ i — Xj|| centimeters, its mass is approximately p(x,) As, grams, and thus 


k 

total mass of C « ^ p (x,) As,- grams. 

i=l 


This is a Riemann sum for a new kind of path integral that generalizes the arc-length 
integral (where p (x) = 1). With this path integral, we are then able to write 


total mass of C = 



Definition 1.6 Suppose /(x) is a scalar function definedfor all x on an unoriented 
simple curve C in R"; the path integral of f over C is 



^/(x)As,- 

i= 1 


mesh = max As, = max || Ax,-||, 

i i 


if the limit exists when taken over all partitions {x,} of C. 

Theorem 1.5. Suppose f is continuous and C has the smooth parametrization x(t), 
a < t < b; then the path integral of f over C exists, and 

[ f{x)ds= [ f(x(t))\\x'{t)\\dt. 

JC Ja 

Proof. Adapt the argument that was used to prove Theorem 1.3. □ 

The theorem implies that the value of the scalar path integral is independent of 
the parametrization of C; even oppositely oriented parametrizations give the same 
value. For example, let us evaluate 


1.2 Work and path integrals 
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when C is the helix parametrized as x(f) = (cos(f),sin(t),f), 0 < t < 2 k. Because 
||x / (f) || = y/2 and yz = tsint, we have 

[ vzds= [ t sin(f) \fldt = — 2 k\[2. 

Jc Jo 

By setting u = 2n — t, and thus t = 2n — a, we get the opposite parametrization 
r (u) = x(2 n — u)\ 

r (u) = (cos( 27T — w),sin(27T — u),2n — u) = (cosm, — sint/,2^:— u). 

Then ||r , («)|| = \fl andyz= (u — 2k) sim/, so 


/ yzds= / (u — 2k) sin(u) V2du 
Jc Jo 


= V2 


2n 


In 


us\r\(u)du — 2n\2 / sin (u)du. 


The second integral is zero, so the two evaluations of f(x,y,z) =yz over the helix 
agree. 

We can even express a vector integral over an oriented path (e.g., the work done 
by a force) as the new kind of scalar integral over that path without its orientation. 
Keeping in mind that the scalar path integral uses the “element of arc length” ds 
(Definition 1.5), let us parametrize the oriented curve C by arc length: x = y(.v), 
0 < s <L. Then, for the vector function F(x), 


/.F -dx= [ F{y(s))-y'(s)ds= [ F(y(j)) • t(s) ds, 

JC Ja JO 

where t(s) = y'(s) is the unit tangent vector at the point y(s) on C; its direction 
indicates the orientation of C. But the last integral is a way to evaluate the scalar 
function F • t over the unoriented curve C; in other words, we have 

J v-d\= J F-t ds. 

The path on the right is unoriented; the information about the orientation of C has 
been transferred to the integrand, into the vector t. To confirm this, let t+ denote the 
unit tangent for +C; then t = — 1 + is the unit tangent for —C, and we have 


/ F ■ dx = [ F • t_ ds = / 

J-c Jc Jc 


F • —1+ ds= — 


Jj 


,F • dx, 


Rewriting 


as we should. 
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1 Starting Points 


Work and the tangential 
component of force 


Random variables 


Normal distribution 



pa b x 


Showing /= f2n 


The scalar F ■ t gives the value of the (signed) projection of F along t; we call it 
the tangential component of F along the (oriented) curve C. Thus, the work done by 
a force displacing an object along an oriented curve is the scalar path integral of 
the tangential component of that force. 


1.3 Polar coordinates 

Statistics deals with random variables that take on values in a given range with a 
certain probability. For example, the weights of a 5-ounce bar of soap coming off a 
production line may be thought of as values of the random variable X because the 
manufacturing process introduces small fluctuations in weight from bar to bar. Few 
bars will weigh exactly 5 ounces, but most will have weights close to 5 ounces, some 
a little higher, some a little lower. The manufacturer can expect that the weights X 
will be dispersed around the central value (here, 5 ounces) in a certain predictable 
way. 

For many random variables like X, the dispersion follows what is called a normal 
distribution. If is a random variable that follows a normal distribution with 
mean /i (its central value) and standard deviation a (its measure of dispersion), then 
the probability that the value of A), 0 lies between a and b is equal to the fraction of 
the area under the entire graph of 


y = gyAx)=e - (x ^ )2/2a2 

that lies between the vertical lines x = a and x = b. In other words, 

, area under graph of g u 0 between a and b 

Prob(a < X^ a < b) = - - -—^-. 

area under entire graph oi g^. a 

These areas are integrals, of course, but the antiderivative of CT (x) is not one of 
the elementary functions of the introductory calculus course, so the integrals cannot 
be found by the usual techniques. Nevertheless, there is a way to find the exact value 
of the entire area when p = 0, a = 1; it is 

I = J go,i(x)dx = j dx = V2 k } 

as we now show by an ingenious use of polar coordinates. 

The idea is to work with two copies of 7 and compute 1 2 instead of 7, using a new 
“dummy” variable of integration in the second copy of I. With the rule e A e B = e A+B , 
we then combine the two integrals into one double (iterated) integral: 
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/ 2 = (y°° e-* 2 / 2 ^ (/" e-f^dy'j 

= j°° I °° e' x2 / 2 e~ y2/2 dxdy = j°° J°° dxdy. 

There is still no convenient antiderivative, but now make a change to polar coordi¬ 
nates, x = r cost), y = rsin 0. The new limits of integration are then 0 < 6 < In, 
0 < r < °°, and dxdy becomes rdrdO. This is the key because it introduces a new 
factor r into the integrand, and with this new factor, the integrand re - '" 2 / 2 does have 
a simple antiderivative, namely — e -r / 2 : 

/ 2 =r r^ ,2rdrde = r~^ /2 

Jo Jo Jo 

Hence / = y/2n. 

In the exercises you are asked to show that 

= J e~(* _ ^ 2 / 2cr2 dx = O'Jin, 

which implies 

Prob(a <Xu, a < b) = — \= [ Z 2 * 7 dx. 

G \/2k J a 

If we now combine the factor outside the integral with the integrand function g^ a (x) 
to form the new function 

e -(x-n) 2 /2a 2 

f ^' a = oy/2n ’ 

we get the more usual formula for the density function of the normal distribution 
with mean )i and standard deviation cr; that is, using /^, a we have simply 

Prob(a < X^a <b) = area under from a to b. 

To what extent is the change to polar coordinates like the coordinate changes we 
have seen in single-variable integrals? For example, in the transformation from dxdy 
to rdrdO, does the factor r play the same role as the factor (p'(s) in the pullback 
from dx to tp'(s)dsl In which case, does r represent a multiplier, as <p'(s) does in 
the microscope equation Ax « <p'(s)As. Furthermore, is there a microscope equation 
(linear approximation) for the polar coordinate change map M : (r, 0) >—> (x, y)2 If 
so, is this microscope equation the source of the transformation of differentials, as it 
is in the one-variable case? And does the multiplier in this new microscope equation 
involve the derivatives ofx atuly with respect to r and 0? We explore these questions 
in the coming chapters. 


dO 


rllt 

/ ( 

JO 


dO = In. 


Normal density function 


Comparing coordinate 
changes 
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1 Starting Points 


Exercises 


1.1. Evaluate 


1.2. Determine 


im J T 


dx 


x 
xdx 


v-2- 


Which type of substitution did you use? 


. +x 2 

1.3. Carry out a change of variables to evalulate the integral 

r R 


[ \/ R 2 — x 2 dx. 

J-R 


l-R 

(This is the area of a semicircle of radius R , and therefore has the value 
kR 2 / 2.) Which type of substitution did you use, pullback or push-forward? 

, „ _ . f arctanx dx arctanx dx ™ 2 

1.4. Determine / -=—and show 

J 1+x 2 Jo 


,i„e/ 


1.5. a. Determine 
b. Evaluate 7 = 


-x 

dw 


w(\aw)P 
dw 


L 


l+x 2 8 ' 

. Which type of substitution did you use? 

; for which values of p is I finite? 


w(lnw)P ’ 

1.6. State a condition that guarantees a function x = <p(s) has an inverse. Then use 
your condition to decide whether each of the following functions is invert¬ 
ible. When possible, find a formula for the inverse of each function that is 
invertible. 


a. x = 1 /s. 

b. x = s + s 3 . 


C. X = 


1 -M 2 ’ 

d. x = sinhs = 
s 


e — e 


e. x = 


f. x = ms + b. 

g. x = coshs = 

h. x = s — s 3 . 

i. x = tanh^ = 

1 —s 


e +e 

lT 

sinhs' 

coshs' 


V l — s 2 


J. x = 


1 +J 


1.7. a. Obtain formulas for f(s) = cos(arcsin.s) andg(s') = tan(arcsin5) directly 
as functions of s that involve neither trigonometric nor inverse trigono¬ 
metric functions. Your answers will involve the square root function and 
polynomial expressions in s. 

b. Compute the derivative of cos(arcsin.s') using the chain rule and the deriva¬ 
tives of cos u and arcsins. Then compute the derivative of f(s) using 
your expression in part (a). Compare the two derivatives. Do the same for 
tan(arcsinj) andg(s). 


1.8. Use x = arcsins to show / cost xdx = sinx — 


/< 


sm 3 x 


3 
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1.9. a. Write the microscope equation (i.e., the linear approximation) for (p(s) = 
at s = 100. 

b. Use the microscope equation from part (a) to estimate \/102 and \/99.4. 

c. How far are your estimates from those given by a calculator? 

d. Your estimates should be greater than the calculator values; use the graph 
of x = (p(s) to explain why this is so. 

1.10. a. Write the microscope equation for <p (s) = 1 /s at s = 2 and use it to estimate 

1/2.03 and 1/1.98. 

b. How far are your estimates from the values given by a calculator? 

c. Your estimates should be less than the calculator values; use the graph of 
x = cp(s) to explain why this is so. 

1.11. Show that \/l +2 h ss 1 + h when h ss 0. 

1.12. a. Determine the microscope equation for x = tan5 at 5 = n/4. 

b. Show that tan (h + n/ 4) « 1 + 2 h when h sa 0. Is this estimate larger or 
smaller than the true value? Explain why. 

1.13. Determine the local length multiplier for x = sin.v at each of the points s= 0, 
s = n/4,s = n/2,s = 2n/3, and s = n. 

1.14. What is true about the map <p : s —> x at a point so where the local length 
multiplier is negative? 

1.15. Consider the hyperbolic sine and hyperbolic cosine functions, sinhs and 
co.sh.v, as defined in Exercise 1.6. Show each is the derivative of the other, 
and show 


cosh 2 s — sinh 2 s = 1 for all s. 



1.16. Use the substitution x = sinh.v to determine 


sinhs to determine 


1.17. Determine the work done by the constant force F = (2,-3) in displacing an 
object along (a) Ax = (1,2); (b) Ax = (1,-2); (c) Ax = (—1,0). 

1.18. Determine the work done by the constant force F = (7, —1,2) in displacing 
an object along (a) Ax = (0,1,1); (b) Ax = (1,-2,0); (c) Ax = (0,0,1). 

1.19. Suppose a constant force F in the plane does 7 units of work in displacing an 
object along Ax = (2, — 1) and —3 units of work along Ax = (4,1). How much 
work does F do in displacing an object along Ax = (1,0)? Along Ax = (0,1)? 
Find a nonzero displacement Ax along which F does no work. 

1.20. Let W( F,Ax) be the work done by the constant force F along the linear dis¬ 
placement Ax. Show that IF is a linear function of the vectors F and Ax. 

1.21. Suppose F = ( P,Q ). Determine the unit displacements Au (i.e., ||Auj| = 1) 
that yield the maximum and the minimum values of W. 
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1 Starting Points 


1.22. Suppose the constant force F = (P, Q) does the work A along the displacement 
(a,c) and the work B along the displacement (b,d). Determine P and Q. What 
condition (on a, b, c, and d) must be satisfied for P and Q to be found? 

1.23. a. Sketch the curve in the (x,y)-plane given parametrically as 


x 


2t 

T + t 2 ’ 


y = 


i-t 2 

i+t 2 ’ 


In particular, label the points where t = — 2, — 1,0, + 1, +2. 

b. Each of the following limits exists; determine the location of each as a 
point in the (x,y)-plane: 




c. Compute a = x(t) 2 +y(t) 2 ; your result should be a constant (i.e., indepen¬ 
dent of t) that is consistent with the sketch of the curve you made in part 
(a). What is the curve and how does a relate to it? 

1.24. Determine the work done by the force field F in moving a particle along the 
oriented curve C, where: 

a. F=(x,3 y), C:(t 2 ,t 3 ), 1 < t < 2. 

b. F = (—y,x), C : semicircle of radius 2 at origin, counterclockwise from 
(2,0) to (-2,0). 

c. F = (y,x), C : any path from (5,2) to (7,11). 

d. F = (0,0, —mg), C: (2t,t,4 —t 2 ), 0<t<l. 

e. F = (— y,x, 1), C: (cos0,sin0,30), 0 <6<A. 

1.25. Determine / F ■ c/x when 

Jc 

a. F = (x + 2 y,x — y), C : straight line from (—2,3) to (1,7). 

b. F = (xy,z,x), C: {t 2 ,t, 1 — t), 0<f<l. 

c. F=( y , X ), C: (Rcost,Rsint), 0<t<8n. 

\x z +j“ x z +y-J 

1.26. Let C be the semicircle of radius 2 centered at the origin, oriented counter¬ 
clockwise from (0, — v/3) to (0, V3). 


a. Show that r(u) = 


4 u 2 u 2 — 2 


^ + 1 u 2 + 1 
(Cf. Example 1, p. 10, and Exercise 1.23, above.) 


,2-V3 < u < 2 + \/3, parametrizes C. 


b. Using r to parametrize C, determine 


ine / xdy. 
Jc 
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We recall here some ideas introduced in the first course in multivariable calculus. 
Suppose that either 

(a) ® F • dx = 0 for all closed paths C, or (b) / F • dx = / F • dx 

Jc JCi Jc 2 


for every pair of oriented paths C\ and C 2 that start at the same points and end at 
the same point B. Then the vector field F is said to be conservative if (a) is true, and 
path-independent if (b) is true. It can be shown that a continuous and everywhere- 
defined vector field is conservative if and only if it is path-independent. The function 
0(x) is a potential for the vector field F(x) if F(x) = grad^(x) for all x. If F has a 
potential, then it is path-independent and (see Exercise 4.36.b, p. 149) 


L 


F • dx = 0(x) 


end of C 

start of C 


1.27. a. In a coordinate system (x,y,z) where the z-axis is vertical, the gravitational 

force field at the surface of the earth can be written as F = (0,0,— gm), 
where g is the acceleration due to gravity and m is the mass of a falling 
object. (Note that g and m are both constant.) Show that 0(x,y,z) = —gmz 
is a potential function for F, demonstrating that F is a conservative field. 

b. What is the work done by gravity in moving an object of mass m from the 
point ( a,b,c ) to (a,/3,y)? Is this negative if c < y? What is the meaning 
of “negative” work? 

c. What is the net work done by gravity in moving an object of mass m from 
the point (a,b,c) to another point (a,/3,c) at the same vertical height as 
the first? (What does it mean to say that the earth’s gravitational field is 
conservative at the surface of the earth?) 

1.28. If x = (x,y,z) is the position of a planet in terms of a coordinate system cen¬ 
tered at the sun, then the force of the sun’s gravity on the planet is given by 

F(x) = —px/r’, where p is a constant (depending on the mass of the sun and 

of the planet), and r = ||x||. 

a. Write F explicitly in terms of the space variables x, y, and z. 

b. Show that the gravitational force obeys the “inverse square” law: ||F|| = 

H/r 2 . 

c. Write 0(x) = p jr explicitly in terms of the space variable, and show that 
0 is a potential for F: gradd> = F. This demonstrates that the gravitational 
field is conservative. 

d. Suppose we choose a unit for distance in such a way that r= 10 when our 
planet is farthest from the sun (called aphelion) and r = 3 when it is closest 
to the sun {perihelion ). How much work does the sun’s gravitational field 
do in moving the planet from aphelion to perihelion? 


Conservative and 
path-independent fields 


Potential functions 
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1 Starting Points 


e. What is the net work done on the planet by the sun’s gravity when the 
planet traverses one complete orbit, from aphelion back to aphelion? (What 
does it mean to say that the sun’s gravitational field is conservative ?) 


1.29. Determine the arc length of each of the following curves. 


a. 

b. 

c. 

d. 

e. 

f. 

1.30. a. 
b. 


x(f) = (3t 2 ,4t 2 ), 0 <t < 1. 
x(t) = (t 2 ,t 2 ), 1 < t < 3. 
x(t) = (e f cos t, sint), a<t <b. 
x(t) = (cost,sint,kt), 0 < t < 2n. 
1 -t 2 21 


x(0 = 


,-l<f< 1 . 


1 +t 2 ' 1 +t 2 / 

The ellipse 16x 2 + 9 y 2 = 144. (Suggestion: Use numerical integration.) 


Determine the arc-length function s{t) for the circle C of radius R parame¬ 
trized as x(f) = (Rcost,Rsint). 

Determine the inverse t = o{s) of the arc-length function and then the 
corresponding arc-length parametrization y(j) = x(<t(^)). 


1.31. Determine the arc-length function s{t) (with s(0) = 0) and the arc-length par- 
ametrization y (s) of the curve parametrized as 


x(0 


l-t 2 
1 +t 2 ’ 



—oo < t < °o. 


1.32. Let C be a thin wire formed into the circle of radius R cm centered at the 
origin. Suppose the mass density of the wire at the point (x,y) is 1 +x 2 gm/cm. 
Determine the total mass of the wire. 

1.33. Let C be the helix (x,y,z) = (cosLsint,t), 0 < t < 4 k, and let s be arc length 
on C. Determine 

/ zds and / r 2 ds. 

Jc Jc 

1.34. Let C be the circle of radius 5 centered at the point (4, —3), and let s be the 
arc-length parameter along C, as measured counterclockwise from the origin. 
Propose a definition for the path integrals 


l a *" b ” d /c“ s2s *’ 

and then determine their values. 

1.35. Is the change from Cartesian to polar coordinates either a pullback or a push- 
forward substitution, or is it some new type? 

1.36. a. Sketch the region D that lies in the first quadrant in the (x,y)-plane between 

the circles x 2 4-y 2 = 1 andx 2 +y 2 = 10. 
b. Describe D in polar coordinates. 
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c. Change to polar coordinates to evaluate the double integral 


JJ sin (x 2 +> -2 ) dxdy. 


1.37. Letg^ ff (x) = e ^) 2 / 2a \ as in the text. 

a. Show g/j'O takes its maximum at x = n and the graph of g^.c has inflection 
points at x = n±o. Sketch the graph of z = g M , a (x) for — 3 o < x < 
H+3a. Do this first with ,u = 5 and a = 2 and then symbolically with 
general values for jj. and a. 

b. Without repeating the argument in the text that showed / = V2k, show that 


J gn,o{x)dx= O'Jin. 


You can do this by making an appropriate change of variable that converts 
this integral to one you can evaluate knowing only that / = J2k. 

1.38. a. Sketch together in the same coordinate plane the graphs ofy = /o, a (x)with 
a = \, a = 1, and <7 = 3. (Use ,u =0 for each.) How do the maximum 
height and the “width” of the graph vary with cr? 
b. Sketch together the graphs of y = f^\{x) with jU = — 2, fi = 1, and/i = 10. 
How do these graphs vary with changing /i? 


The various probabilities, 


Q -(x-n) 2 /2a 2 

Prob(a < Xjx c <b)= area under--=— from a to b, 

G y'lK 

associated with a normal random variable cannot be computed in terms of the an¬ 
tiderivatives of standard functions. However, because it is important to have these 
values, strategies have been devised to get access to them. Here is the first: Assume 
(only for the sake of simplicity) that 0 < a; then 

Prob(a < X^, a < b) = Prob(0 < X^, a < b) — Prob(0 < X^ c < a). 

In other words, it is sufficient to calculate only Prob(0 < Y U CJ < b) for various 
values of b. The following is the second strategy. 

1.39. Suppose Zqj is a normal random variables with mean 0 and standard devia¬ 
tion 1. Continue to assume Xfj a is a normal random variables with mean /J 
and standard deviations a. Show that 

Prob(0 < X^ a <b) = Prob(0 < Zq \ < ( b-n)/o ). 

Suggestion: Consider the push-forward substitution z = (x — jj. )/a and use it 
to show that 
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1 Starting Points 


O'Jin 


f 

JO 


e -( x ~H) 2 /2a* dx= l f 
Jin Jo 


(b~n)/a 


-z 2 /2 


dz. 


The last result implies that it is sufficient to calculate (e.g., by numerical integration) 
the values 

P{z o) = Prob(0 < Z 0 ,i < z 0 ) 

for various numbers zq > 0. In other words, we need only know the distribution of 
one very special normal random variable, Zo.i; all others can be calculated from 
it. The values P(zq) are some times called “z-scores”; the probability that a given 
normal random variable lies in a given range reduces to knowing certain z-scores. 


1.40. For simplicity, we assumed that a > 0 when we reduced probabilities for 
X U (J to certain z-scores. This assumption is not necessary; describe how to 
remove it. 




Chapter 2 

Geometry of Linear Maps 


Abstract The geometric meaning of a linear function x i—> y = mx is simple and 
clear: it maps R 1 to itself, multiplying lengths by the factor m. As we show, linear 
maps M : R" —> R” also have their multiplication factors of various sorts, for any 
n > 1. In later chapters, these factors play a role in transforming the differentials in 
multiple integrals that is exactly like the role played by the multiplier <p'(s) in the 
transformation dx = <p' (s) ds in single-variable integrals. With this in mind, we take 
up the geometry of linear maps in the simplest case of two variables. 


2.1 Maps from M 2 to M 2 


Some examples M : (u,v) h-> (x,y) illustrate the possibilities that we face. 




This map carries horizontal lines to horizontal lines and multiplies horizontal Horizontal and vertical 

lengths by 2. It carries vertical lines to vertical lines and multiplies vertical lengths directions are invariant 

by These lines are special: they are the only ones whose directions are left un¬ 
changed by the map. (For example, the image of a line with slope Av/Au = 1 has 
the different slope Ay/Ax = lAv/2A u = 3/10. See the exercises.) A grid of unit 


J.J. Callahan, Advanced Calculus: A Geometric View, Undergraduate Texts in Mathematics, 
DOI 10.1007/978-1-4419-7332-0 2, © Springer Science+Business Media, LLC 2010 
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2 Geometry of Linear Maps 


M 2 = 



squares in the (m,v)- plane is mapped to a grid of rectangles in the (x,y)- plane, and 
the sides of the rectangles are parallel to the sides of the squares. Finally, orientation 
is preserved: a counterclockwise circuit around the unit square in the (w,v)-plane 
maps to a counterclockwise circuit of its image rectangle in the (x,y)-plane. 


Our second example is also quite simple in form; it is a pure reflection across the 
horizontal axis: 



Orientation is reversed 


= 


0 -2 
2 0 




The horizontal and vertical lines are still the invariant ones, and this time even 
lengths on them are unchanged. Vertical lines are reversed in direction, though, 
because the vertical multiplier is — 1. Orientation of the whole plane is therefore 
reversed: the counterclockwise circuit in the (M,v)-plane has a clockwise image in 
the (x,y)-plane. Note that M\ and M 2 are both diagonal matrices, and the multipliers 
are their diagonal elements. 


Our third example, although still simple in form, introduces a new action: rota¬ 
tion, 


M 3 : 



0 -2 
2 0 



Consider the effect Af 3 has on a unit vector u that makes an angle 0 with the positive 
horizontal axis: 



M 3 (u) 


f 0 —2\ /cos (A 
^2 0 J ^sinfly 


(— 2sin0\ _ /cos(0 + n/2)\ 

y 2cos0 J Ysin(0 + n/2)J ' 


Thus, M 3 (u) is two units long and makes an angle 8 + n/2 with the horizontal 
axis. (You should check that — sin0 = cos(0 + n/2) and cos0 = sin(0 + n/2).) 
Every unit vector, and therefore every nonzero vector, is rotated by n/2. For this 
linear map, no line is special in the sense that it is preserved with at most a change 
in length, so there are no length multipliers. Nevertheless, A/ 3 doubles the length 
of every vector and it preserves orientation. It is the combination of a rotation (by 
90°) and a uniform dilation (by a factor of 2), as the following figure shows. Any 
combination of a rotation with a uniform dilation is a linear map of the plane to 
itself. 





































2.1 Maps from R 2 to R 2 
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M 3 
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X 








The maps M\ and Mi have similarities not shared with My, M\ and M 2 are what we 
call strains. The next two matrices provide us further examples of strains. 

Example 4 has a more complicated formula than the previous ones, but we ulti¬ 
mately show that it is as simple geometrically as the first two. 


M a : 







Neither horizontal nor vertical lines are preserved. The image of the grid of unit 
squares is a grid of congruent parallelograms, but there is apparently little to connect 
the two grids geometrically. Notice, however, that the diagonals of the square grid 
are invariant; they are shown dotted in the figure. The image of a vector that lies on 
the diagonals is just a multiple of itself: 



Specifically, the diagonal in the first and third quadrants is stretched by the factor 3, 
whereas the diagonal in the second and fourth is simply flipped, with no change in 
length. The presence of a negative multiplier suggests that orientation is reversed, 
and that is confirmed by the clockwise circuit in the image. 

In the figure below, we have switched to a new grid that is parallel to the invariant 
diagonals in order to see the geometric action of M A more clearly. The basis vectors 
for the new grid (in both source and target) are 



Strains 



Diagonals are invariant 


Multipliers are 3 and -1 
on diagonals 


Geometric clarity with 
a new basis... 
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2 Geometry of Linear Maps 




... and new coordinates Every point (or vector) in the source now has new coordinates (77, v) as well as the 
original coordinates (u,v), so we must be able to change from one to the other. To 
see how, suppose vector p has new coordinates (77, v); then 


P = 





(u 



+ (m + v) 



M 4 = 




coordinate change and its inverse are 

u = u—v 77=j(m + v) 
v = 77 + v v = j (— u + v) 

The coordinates (x,y) and (x,y) change the same way, of course. Using the coor¬ 
dinate changes we can transform the original formulas for the linear map Mt into 
formulas that use the new coordinates: 

x = j{x+y) = \ {u + 2v + 2u + v) = j(3m + 3v) = 377, 
y= j(— x+y) = \(—u — 2v + 2u + v) = \(u — v) = —v. 






















2.1 Maps from R 2 to R 2 

Thus, in the new coordinates, our linear map is described by a new matrix: 
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With the new matrix, there is no doubt that M 4 has the same kind of geometric 
action as M\ and M 2 (but not M 3 !); as a combination of stretches in two different 
directions, it is a strain. Thus, a coordinate change can bring clarity and simplicity 
to the study of linear maps, just as it can for the study of integration. 

It is worth seeing the connection between M 4 and M 4 directly in terms of matri¬ 
ces. The coordinate change itself is a matrix multiplication, U = GU, U = G~ l U, 
where 



(Notice that the columns of G are the coordinates of the new basis with respect to 
the old.) The same change X = GX and X = G~ 1 X happens in the target. In the new 
coordinates, the map X = M 4 U takes the form 

X = G l X = G l M 4 U = G l M 4 GU = W 4 U. 

For us, the object of this string of equalities is the conclusion 

W 4 = G~ l M 4 G, 

which leads, finally, to the following definition. 

Definition 2.1 Suppose A and B are nxn matrices; then we say thatB is equivalent 
to A if there is an invertible matrix G for which B = G~ l AG. 

What we have just shown about M 4 and M 4 implies that if B is equivalent to 
A, then there is a basis of R” on which A acts in the same way that B acts on the 
standard basis ofM". Alternately,^ and B represent the same linear map in different 
coordinates. The matrix G, in B = G~ l AG, represents the coordinate change. 

Note: if B = G 1 /I G, then A = II l Bff where H = G 1 , so A is equivalent to B 
when B is equivalent to A. This allows us to say, more symmetrically, that “A and B 
are equivalent.” In the exercises you are asked to show that if C is equivalent to B 
and B is equivalent to A, then C is also equivalent to A. In other words, equivalent 
matrices are always mutually equivalent. We define an equivalence class of nxn 
matrices to be the set of all matrices equivalent to some given one. (An example 
of an equivalence class in a more familiar context is a rational number, a rational 
number is a set of mutually equivalent integer fractions, where two such fractions 
a/b and c/d are defined to be equivalent if ad = be.) Our aim, which is to identify 
the different geometric actions of a linear map M : R 2 —> R 2 , is accomplished by 
determining the equivalence classes of 2 x 2 matrices. 


Equivalence 
of matrices 


Equivalence classes 
of matrices 
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2 Geometry of Linear Maps 


M 5 = 



Multipliers are 3 
and -1 again 


In all our examples where there were invariant lines, those lines were mutually 
perpendicular. Our next example shows us we cannot expect this to happen in gen¬ 
eral. 



In the figure below, it may appear that Ms leaves vertical lines invariant. But this is 
not true: a vertical line in the target is the image of a horizontal line in the source, 
not a vertical one. In fact, the directions of the invariant lines are indicated by the 
heavy vectors and the dotted lines in that figure, because 





We show how to find invariant lines for an arbitrary linear map immediately below. 
For the moment, we just observe that Ms has the same length multipliers as M 4 : 3 
and — 1. The two maps have different invariant lines, though; in particular, the ones 
for Ms are not mutually perpendicular. Nevertheless, it is reasonable to call Ms a 
strain. With a new coordinate system and grid that is based on vectors along the 
invariant lines (cf. Exercise 2.2), Ms has the following form. 

























2.1 Maps from R 2 to R 2 
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We discover that M 5 is in the same equivalence class as M 4 (because they are 
both equivalent to M 5 = M 4 ). Indeed, they have the same geometric description: 
they map the plane to itself by stretching it by a factor of 3 in one direction and 
simply flipping it—without a stretch—in another. Yet M 5 and M 4 are not the same, 
because they perform their stretches and flips along different lines (their own invari¬ 
ant lines). It is evident, though, that the invariant lines and the associated multipliers, 
taken together, characterize each linear map geometrically: two maps with the same 
multipliers acting on the same lines must be identical. 

Multipliers and invariant lines therefore give us an important way to character¬ 
ize a linear map. They are introduced in the following definition with their usual 
names. Rather curiously, historical accident has cast those names—eigenvalue and 
eigenvector—half in German and half in English. Eigen means “one’s own”, or 
“characteristic”, and the alternatives characteristic value and characteristic vector 
are also used. Furthermore, eigen can be translated into French as propre, and the 
terms proper value and proper vector are likewise in use, but less frequently. 

Definition 2.2 Let M : K" —> R" : U 1 —> X be a linear map defined by matrix mul¬ 
tiplication: X = MU. A vector U fi 0 is an eigenvector of M with eigenvalue X if 
MU = XU. 

Note that an eigenvector is nonzero by definition because it has to determine an 
invariant line. An eigenvalue can be 0, though; it just means M has a nonzero kernel 
consisting of the eigenvectors with eigenvalue zero. 

Let us rewrite the “eigen” condition MU = XU first as MU = XIU (where I is 
the identity matrix), and then as (M — XI)U = 0. This says that U is in the kernel 
of the newly defined matrix M — XI. But U 0, so M— XI must be noninvertible, 
implying det(M XI) = 0. Because the determinant of a matrix is a polynomial 
function of the elements of the matrix, the expression p(X) = det(M — XI) is is a 
polynomial in A, called the characteristic polynomial of M. The equation p{X) = 0 
is the characteristic equation of M. 

Theorem 2.1. Each eigenvalue of M is a root of its characteristic equation. □ 

But real polynomials can have complex roots, too. For example, our rotation 
matrix M 3 has the characteristic polynomial p{X) = X 2 +4 whose roots are X = ± 2 i. 
Furthermore, 



In other words, when we allow M 3 to act on ordered pairs of complex numbers, we 
find that M 3 does have invariant directions. A real polynomial always has complex 
roots, but need not have any real roots. Thus this example suggests that, for the 
purpose of getting the simple view of the action of matrix multiplication, we use 
complex u-tuples instead of real ones (M : C" —> C”) to define eigenvectors and 
eigenvalues (Definition 2.2). With this understanding, every root of the characteristic 
equation of M becomes an eigenvalue of M. 


Characteristics of 
a linear map 


Eigenvectors and 
eigenvalues 


A = 0 and the 
kernel of M 


Characteristic equation 


Complex roots 
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2 Geometry of Linear Maps 


Trace and determinant 


Eigenvalues of 
equivalent matrices 


If M is an 2 x 2 matrix, we have 


p{X) = det 

= (a - 
= X 2 ■ 



X)(d — A) — bc = X 
■tr(M)X +det(M), 


, (a — X b 
= d "l, c d-X 

2 — (a + d)X + ad — be 


where tr (M) = a+d is the trace of M and det (M) = ad — be is, of course, its 
determinant. Thus, the eigenvalues of M are the roots of a quadratic equation that 
involves the trace and determinant of M. If X\ and X 2 are these roots, then 

(A — Ai) (A — X 2 ) = A~ — (Ai + A2)A + X\ A 2 


is also the characteristic polynomial, so we have the following proposition. 

Theorem 2.2. The sum of the eigenvalues of a 2x2 matrix is equal to its trace and 
their product is equal to its determinant. □ 

If we write the equation for the eigenvalues of the 2x2 matrix M in the form 

tr M± \J\x 2 M — 4 det A/ 

Al 2 =-j-’ 

we see these roots will be complex when the discriminant is negative: 

Xx 2 M — 4detA/ = [a -\-d) 2 — 4(ad— be) = (a — d) 2 + 4bc < 0. 

Because (a — d) 2 > 0 ,b and c must be of opposite sign and have be < —(a — d) 2 /4 
for M to have complex eigenvalues. 

As we have seen, equivalent matrices describe the same linear map but in terms 
of different bases. We would expect, then, that such matrices have the same eigen¬ 
values, and their eigenvectors would be mapped to one another by the coordinate 
change that connects the matrices. 

Theorem 2.3. Suppose A and B = G 1 AG are equivalent matrices, and U is an 
eigenvector of B with eigenvalue A. Then U = GU is an eigenvector of A with the 
same eigenvalue A. 

Proof. Suppose U is an eigenvector of B with eigenvalue A: BU = XU. Then 

G~ X AGU = XU, soA(GU) = GXU = X(GU). □ 

Corollary 2.4 Equivalent matrices have the same eigenvalues and therefore the 
same trace, determinant, and characteristic polynomial. 

Proof. According to the theorem, every eigenvalue of B = G 1 A G is an eigenvalue 
of A. But equivalence is symmetric (A = II 1 B! I with H = G 1 ), so every eigen¬ 
value of A is an eigenvalue of B. □ 






2.1 Maps from R 2 to R 2 
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Even when the eigenvalues and eigenvectors of M are complex, they can provide 
crucial information about the geometric action of M on the real plane R 2 . Consider 
the map 





Here trMe = 0 and det Me = 1, so the characteristic polynomial is A 2 + 1 and the 
eigenvalues are ±i. However, Me is not a rotation: it turns the coordinate axes by 
different amounts (so their images are not perpendicular, as they would be under a 
rotation). 




A clearer picture emerges, however, when we consider the action of Me on the heavy 
vectors; note that 

M 6 (u)=y, M 6 (v) = -x. 

Consequently, in terms of the new grid and coordinates (which use {u, v} and {x,y} 
as bases in the source and target), our linear map is now described by 


Me: 





You can check directly that the new matrix Me has the same trace, determinant, 
characteristic polynomial, and eigenvalues as Me. But the geometric action of Me 
is simpler to describe: applied to the standard basis (instead of {u, v}), Me is just 
rotation by 90°. Note, however, that Me, applied to the basis {u, v}, is not a rotation, 
because Me is not a rotation. Thus, although Me is not a rotation, it is equivalent to 
a 90° rotation. 



Me is equivalent to 
a 90° rotation 
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2 Geometry of Linear Maps 



M 7 has only one 
invariant direction 


Shears 


"■=G'o 2 ) 


There is essentially only one more type of linear map for us to analyze. Here is 
an example: 





The horizontal lines are invariant, but the vertical ones are not. In fact, there is no 
second set of invariant lines. We can trace this shortcoming to the fact that M-j has 
only one eigenvalue, A = 2 ; it is a repeated root of the characteristic polynomial 
A 2 — 4A + 4 = (A — 2 ) 2 . All eigenvectors are therefore the solutions of the single 
pair of equations 




giving just 




This eigenvector is horizontal; it implies that horizonal lines are invariant under A/ 7 . 
But because no vector in any other direction is an eigenvector, no other direction 
is invariant. The geometric action of A / 7 is called a shear. Of course we associate 
eigenvalues with stretches; in this example the shear is combined with a uniform 
dilation whose magnitude is given by the single eigenvalue 2 . 

A shear can take a less recognizable form, as in the following example. 


Mi : 


x = 4u — 2 v, 
y = 2u, 


A/s 





Because trA/s = detA/s = 4, A/g has the single eigenvalue A = 2. There is an 
eigenvector in only one direction. In the exercises you are asked to find that eigen¬ 
vector and then to verify that the coordinate change 










































2.1 Maps from R 2 to R 2 
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G : 


u = u + v, 
v = 17, 


converts A/g into M-j: G 1 Mg G = Mq. This implies Mg is equivalent to a shear com¬ 
bined with a uniform dilation by the factor 2. 

Before we proceed to a description of all the different geometric actions of a 
linear map M : R 2 —> R 2 , it is helpful to comment on a few specific matrices. The 
matrix 


Re 


{cos 9 — sin0 
ysind cos0 


rotates the plane by 9 radians; it has complex eigenvalues A± = cos 6 ± i sin (9. The 
matrix 


C a Jb = 



has a similar form, and has the complex eigenvalues, A± = a ± i b, but is not a simple 
rotation if a 2 + b 2 f 1. However, the following theorem connects it to a rotation. 


Theorem 2.5. If(a,b) f (0,0), then the matrix C a y rotates the plane by the angle 
6 = arctan(b/a) and then performs a uniform dilation by the factor V a 2 + b 2 . 


Proof. By hypothesis, fa 2 + b 2 f 0, so we can factor this term out of each compo¬ 
nent of C a f. 


C a b = \Ja 2 + b 2 


( 


-b 


V a 2 +b 2 V a 2 +b 2 
b a 

\V a 2 +b 2 fa 2 + b 2 J 


V a 2 + b 2 


( cos 9 
ysind 


— sin (A 
cos0 J ’ 


In the matrix on the right, we have made the replacements 


sfa 2 ^- 


■ b 2 


= cos 9 and 


b 


V a 2 + b 2 


= sin0 


by using the angle 9 = arctan(b/a), as the figure shows. In fact, we can extend 
9 = arctan(b/a) as a function of two variables a and b (cf. Exercise 2.10) to define 
a unique value of 9 in the interval — n < 9 < n for every (a, b ) f (0,0). That is, we 
need not require that a and b be positive. Therefore, 


C a ,b ( v ' b 2 ) /ay ^ 

Suppose M has an eigenvector U with eigenvalue 0; then M collapses R 2 along 
the direction of U. For this reason, we describe any matrix with a zero eigenvalue 
as a collapse. A rather special example is 


K = 



with eigenvector U 



Rotations 



Collapse and 
shear-collapse 
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2 Geometry of Linear Maps 


Because trK = det K = 0, the characteristic equation of K is A 2 = 0. This has only 
a single root, the repeated eigenvalue A = 0. Because there is only a single eigendi- 
rection, given by the eigenvector U, above, K behaves like a shear. Because its sole 
eigenvalue is 0, it is also a collapse ; we call it a shear-collapse. 

Theorem 2.6. Every linear map M : R 2 —> R 2 is equivalent to precisely one of the 
types listed in the following table; M lies in the equivalence class of matrices that 
have the same eigenvalues and the same number of eigendirections. 


Equivalence Classes of 2 x 2 Matrices 
and Their Representatives 


Name 

Zero 


Shear-collapse 


Strain-collapse 


Pure dilation 


Shear-dilation 


Strain 


Matrix Eigenvalues* Eigendirections 



0,0 


all 


(Si) °’° 

(o o) °’ A twc 

(o a) a,a a11 



A,A 


one 



Ai f %i 


two 


Rotation-dilation 



a ± ib 


none 


* An eigenvalue not written as 0 is understood to be nonzero. 


Proof You carry out parts of the proof in the exercises. The basic classification is 
by the eigenvalues of M: 

• Real and equal: zero, shear-collapse, pure dilation, shear-dilation 

• Real and unequal : strain—collapse, strain 

• Complex conjugates: rotation-dilation 

Types are then further separated by the number of eigendirections that M has: 

• None: rotation-dilation 
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2.1 Maps from R 2 to R 2 

• One : all shears 

• Two', all strains 

• All', pure dilations, including zero □ 

If a matrix of a linear map has real eigenvalues and eigenvectors, the eigenvectors 
determine the map’s invariant lines and the eigenvalues give the length multiplica¬ 
tion factors along those lines. But even more is true: the product of those factors then 
tells us how much the map magnifies areas', the sign of the product even indicates 
how the map affects orientation. Furthermore, the area multiplier is just the deter¬ 
minant of the matrix (because the product of the eigenvalues is the determinant), so 
the area multiplier can be determined directly from the matrix itself, without first 
calculating the eigenvalues. (This is particularly useful when the eigenvalues and 
eigenvectors are complex because then the matrix has no usable length multipliers.) 

We need a notation that indicates the orientation of a parallelogram, and this is 
easily obtained. We use v A w to denote the parallelogram spanned by the vectors 
v and w, in that order. Call this the wedge product of v and w. The order deter¬ 
mines the “sense of rotation”—either clockwise or counterclockwise—that carries 
the first-named vector, v, to the second, w. Reversing the order, to w A v, reverses 
the sense of rotation; we write w A v = — v A w. A parallelogram has positive ori¬ 
entation if it has the same sense of rotation as the positive coordinate axes, and 
negative orientation if it has the opposite sense. (If v and w are linearly dependent, 
v A w collapses to a line segment and has no orientation.) As a rule, we take the 
positive sense of rotation to be counterclockwise. Thus, in the adjacent figure, vA w 
is negatively oriented and w A v is positively oriented. 

The signed area, area v A w, will then be determined by the following two stipula¬ 
tions. First, the signed area of the unit square ei Ae 2 should be +1 (rather than — 1). 
Second, area(w A v) = — area(v A w) for all v, w. 

Theorem 2.7. area ( X A = det ( V 1 Vt 1 V 
\v 2 J \W 2 ) V V 2 W 2 ) 

Proof. See Exercise 2.15. □ 

The signed area is the determinant of the matrix V whose columns are the coor¬ 
dinates of v and w, in that order. The matrix represents a linear map 

X = V(s,t) = SV + fvv 
t 

e 2 



Area multiplier 


Orientation and 
signed area 


v 



WAV = — VAW 


Parametrizing vAw 


that maps the unit square ei A e 2 to v A w. Thus, the orientation and area of v A w are 
determined by a parametrization. 
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2 Geometry of Linear Maps 


xAy in R", n > 3 


x x y in R 3 


xAyAz in R 3 


Theorem 2.8. If M : R 2 —> R 2 is a linear map and v A w is an oriented parallelo¬ 
gram in the source, then M(y Aw) = M(y) A M( w) is an oriented parallelogram in 
the target and 

areaM(v Aw) = det M x area(v A w). 

Proof. In Exercise 2.16 you are asked to prove this directly by analyzing the func¬ 
tion area M(\) A M( w). □ 

Corollary 2.9 The area multiplier of the linear map M : R 2 —> R 2 is det M. The 
map M reverses orientation precisely when det M <0. □ 


2.2 Maps from M" to M” 

Because we found the area multiplier to be the most salient geometric feature of a 
linear map of the plane, we can expect that the volume multiplier, and its higher¬ 
dimensional analogues, will play a similar role here. 

In R", n > 3, we continue to use xAy to denote the oriented parallelogram 
spanned by the vectors x and y. When n = 2, the orientation of x A y is fixed in 
relation to the orientation of the two coordinate axes. However, when n> 3, this is 
not true: an orientation-preserving linear map of R" can reverse the orientation of 
x A y (see below, p. 44). Moreover, because the coordinates of x and y now make 
up an n x 2 matrix V —for which the determinant is not even defined—we cannot 
express area(x Ay) as the determinant of V. (Let be the transpose of the matrix 
of V; it is a 2 x n matrix. The product V^V does give a square 2x2 matrix, and 
area 2 (x A y) = det V. See the exercises.) 

In R 3 , the cross-product of two vectors is defined: p = x x y is the unique vector 
with length | area(x Ay)| and with direction orthogonal to both x and y so that the 
three vectors x, y, p—in that order—have the same orientation as the three coor¬ 
dinate axes. We call this the positive orientation, and always take it to be right- 
handed , meaning that the thumb, index finger, and middle finger of the right hand 
can be lined up with the first, second, and third coordinate axes, respectively. 

P = x x y 

XAy 


For vectors x, y, and z in R 3 , we define x Ay A z to be the oriented parallelepiped 
spanned by x, y, and z, in that order. Notice that the parallelepiped shown in the 
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figure on the right, above, has /e/f-handed, or negative, orientation. To calculate its 
volume, we take x A y as base and measure its height as the length of the projection 
of z on p = x x y: 


vol(xAyAz) 


area of base- height 


II x x y|| 


(x x y) ■ z 


(xxy)-z. 


The quantity (x x y) ■ z is called the scalar triple product of x, y, and z. Note 
that the parentheses can be removed, because x x (y • z) is meaningless. The order 
of the three vectors in x x y • z is still important, though. 

Theorem 2.10. The signed volume of the oriented parallelepiped x A y A z is the 
scalar triple product x x y • z. The volume is negative precisely when the paral¬ 
lelepiped has negative orientation. 

Proof. The first statement has been proven above. To prove the second, note that 
the parallelepiped x A y A (x x y) has positive orientation by definition, at least when 
x x y / 0. Therefore, x A y A z has negative orientation precisely when z and x x y 
lie on opposite sides of the plane determined by x A y. But x x y is perpendicular to 
x A y, so z and x x y are on opposite sides of x A y when z makes an obtuse angle 
with x x y, and that is precisely the condition that 


(x x y) ■ z = vol(x A y A z) < 0. 


□ 


Theorem 2.11. Let V be the matrix whose columns are the coordinates of x, y, and 
z, in that order. Then vol(x A y A z) = det V. 

Proof. Let x = (xi,x 2 ,x 3 ) f , y = Oi,v 2 ,y 3 ) t , z = ( z A z 2, z 3) f - Then 


Scalar triple product 


Volumes and 
determinants 


xi y i Z| 

V = x 2 yi zi | , 

\*3 T3 z 3) 

and if we calculate the determinant of V along the third column, we get 


detK = 


* 2 T 2 
*3 T3 


z l + 


*3T3 

x\ y i 


z 2 + 


X\ y 1 
*2 yi 


z 3\ 


note the order of the rows in the second determinant. On the other hand, 


x x y = 


* 2 T 2 

*3T3 


*3 T3 

x\ y 1 


X\ y 1 

X 2 V 2 


t 


so 


vol(x AyAz) = xxy-z = 


X2V2 

X 3 V 3 , 


Z 1 + 


X 3 T3 

xi yi 


z 2 - 


XI y 1 
x 2 T 2 


z 3 = det V. 


□ 
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2 Geometry of Linear Maps 


Orientation 


Volume multiplier 


Parallelograms in R 3 


Corollary 2.12 The parallelepiped x A y A z has positive orientation if and only if 
the linear map V : R 3 —> R 3 that maps the standard basis ei,e 2 ,e 3 to x,y,z, in that 
order, has det V > 0. 

Proof. We use two fundamental results of linear algebra: (a) a linear map is uniquely 
defined by its action on a basis; and (b) the matrix V whose columns are the coordi¬ 
nates of x,y,z, in that order, has F(ei) = x, F(e 2 ) = y, F(e 3 ) = z. By the preceding 
theorems, det V >0 if and only if x A y A z has positive orientation. □ 

Corollary 2.13 An ordered set of vectors {x,y,z} has positive orientation if and 
only if it is the image of the standard basis {ei, e 2 , © 3 } under a linear map M with 
det M > 0 . □ 

Corollary 2.14 If M : R J —> R 3 is a linear map and xAyAz is an oriented par¬ 
allelepiped in the source, then M(x A y A z) = M(\) A M{ y) A M(z) is an oriented 
parallelepiped in the target and 

vol M(x A y A z) = det M x vol(x AyAz). 

Proof. This is analogous to Theorem 2.8 (p. 42) and is proven the same way. □ 

Corollary 2.15 The voliune multiplier for the linear map M : R 3 > R 3 is det M. 
The map M reverses orientation precisely when det M <0. □ 


Suppose x x y f 0. Then there is a linear map L : R 3 —> R 3 with positive determi¬ 
nant for which L(x) = y, L( y) = x (see Exercise 2.21). Consequently, orientation¬ 
preserving linear maps of R 3 need not preserve the orientation of 2-dimensional 
parallelograms that lie in R 3 . (We still orient such objects; see pp. 388ff.) The sign 
of area(xAy) has no intrinsic geometric significance in R 3 ; thus we always take 
area(xAy) > 0 . 

There is a remarkable connection between x A y and its projections onto the three 
coordinate planes. First of all, if x = {x\,X 2 ,xf)\ y = (yi,y 2 ,y 3 y, then 


area(xAy) = ||x x y|| 


( X2V2 
V *3 T3 


*3 y 3 
X\ y\ 


xi y 1 
X2 yi 


t 


X2 y2 
x 2 T3 


2 

+ 


^3 T3 

xi y\ 


x\ y 1 

x 2 y2 


Let x, denote the projection of x onto the coordinate plane u, = 0, i = 1,2,3, and 
similarly for y,. Then x, A y, is a parallelogram in a 2-dimensional plane whose area 
is therefore a simple 2 x 2 determinant: 


area(xj Ayi) 


X2 T2 
2C3 T3 


area(x 2 Ay 2 ) 


^3 >'3 

xi y 1 


area(x 3 Ay 3 ) 


xi y 1 

x 2 y 2 
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U 



We can therefore rewrite our expression for area(x A y) as 


A “Pythagorean” 
theorem 


area(xAy) = area 2 (xi Ayi) +area 2 (x 2 Ay 2 ) + area 2 (x 3 Ay 3 ); 

in other words, the square of the area of a parallelogram is equal to the sum of the 
squares of the areas of its projections onto the three coordinate planes. We can think 
of this as a “Pythagorean” theorem whose more usual form deals with lengths rather 
than areas, but relates, in the same way, the length of a vector to the lengths of its 
projections to the three coordinate axes. 

Although we cannot visualize R" directly when n > 3, we do carry over geomet- Geometry in R", n > 3 
ric concepts by analogy. For example, we continue to say the vector x has length 
||x|| = y/x ■ x and the angle 9 between the vectors x and y (assuming x 0 y) is 



Of course, arc cose/ is defined only when |r/| < 1, so our definition of 9 makes sense 
only if |x • y| < ||x|| ||y|| for all vectors x,y in R". This fact is established in the exer¬ 
cises. In what follows, {ei, e 2 ,... ,e„} is the standard basis for W. As you can see, 
the definitions relate the orientation and volume of an w-dimensional parallelepiped 
to the determinant of a certain n x n matrix. We review the definition of an n x n 
determinant in the exercises. 

Definition 2.3 An ordered set {vi,V 2 ,... ,v„} has positive orientation if det V > 0, 
where V : R” —> R" is the linear map defined by the conditions V (e,) = v/ and 
i = 1,2 The set has negative orientation if det V < 0. 
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2 Geometry of Linear Maps 


^-parallelepipeds 


Orientation and volume 


n x n determinants 


Image and kernel 
subspaces are graphs 


Rank-nullity theorem 


We can now construct the analogue of a parallelepiped, and define its volume and 
orientation, by extending the wedge product as follows. 

Definition 2.4 Let {vi, V 2 ,..., v„} be an ordered set of vectors in R"; the oriented 
n-dimensional parallelepiped vi A V 2 A • • • A v„ is the set of vectors 

n 

w = Yt,Vi, 0 < t, < 1, i'=l,...,n. 

i=l 

Definition 2.5 The orientation ofv\ A V 2 A • • • A v„ is the orientation of the ordered 
set {vi,V 2 , • •. ,v„}; its n-volume (or just volume) is 

vol(vi A V 2 A • • • A v„) = det V, 

where V is the matrix whose jth column consists of the coordinates of Vy/ that is, 

V(ej) = Vy. 

The volume of an ^-parallelepiped can be either positive, negative, or zero. If 
the volume is zero, then detV = 0, so the columns of V (which are the coordinates 
of the edges of the parallelepiped) are linearly dependent. The parallelepiped does 
not fill out an n-dimensional region in R". Our final statement about volumes is the 
analogue of similar results in R 2 and R 3 , and is proven the same way. 

Theorem 2.16. The volume multiplier of the linear map M : R" —> R" is det M; that 
is, 


volM(vi A • • • A v„) = volM(vi) A • • - A M(\ n ) = det Mx vol(vi A • • • A v„) 

for every oriented n-parallelepiped vi A • • • A v„. The map M reverses orientation 
precisely when det M < 0 . □ 


2.3 Maps from M" to R p , p 

A good example of a map between spaces of the same dimension is a coordinate 
change. Of course, a coordinate change has to work both ways; that is, the map must 
be invertible. When the source and target have different dimensions, invertibility is 
out of the question, but the geometric action of such linear maps still has a simple 
description. When the source is larger, the map cannot be one-to-one: the kernel of 
the map must be a linear subspace of positive dimension in the source. When the 
target is larger, the map cannot be onto: the image must be a linear subspace of 
strictly smaller dimension than the target. As we show, each of these subspaces is 
the graph of a new linear map that is defined implicitly by the original one. 

When L : R” — > R p is a linear map, the kernel, or null space, of L is the linear 
subspace kerZ. of the source R” that consists of all vectors v for which L(y) = 0. 
The image of L is the linear subspace im L of the target consisting of all vectors x 


2.3 Maps from R" to R p , n ^ p 


47 


of the form x = L(v), for some v in the source. We call r = dimimZ the rank of L 
and k = dim kerZ its nullity. The rank-nullity theorem of linear algebra says that 

r + k = rank of L + nullity of L = dim source of L = n. 

To analyze linear maps L : R" —> M. p for which n ^ p, let us first assume n> p. 
Because the image is a linear subspace of the target, we always have r<p. Because 
n — p > 0, the rank-nullity theorem implies k = n — r>n— p > 0. In other words, 
the kernel of L has positive dimension, at least as large as n—p. Let us now look 
more closely at kerZ. 

We begin with an example in which n = 3 and p - I, so L has the general form 
x = L(u,v,w) = au + bv + cw. How can we describe kerZ? To illustrate, suppose 
L(u,v,w) = u — 2v— 3w. The kernel ofZ is the locus of points in (u, v, w)-space that 
satisfy the equation 

u — 2v— 3w= 0. 

The figure shows this locus is a (2-dimensional) plane through the origin. We can 
solve the equation for w, for example, and get 


u — 2v 


w = 


The original equation u — 2v — 3w = 0 therefore implies that w is a (linear) function 
of u and v. Thus, we can view the plane in the figure as either a locus (i.e., the locus 
of zeros of L(u,v,w)) or a graph (e.g., the graph of w = (m — 2v)/3). 

Of course, this is not the only functional relation that is implicit in the equation 
u — 2v—3w= 0; we get others by solving for u or v: 


u = 2v + 3 w or v = 


u — 3 w 


In each case, however, precisely one of the variables is expressed in terms of the 
other two. 

We can say the same about an arbitrary linear function of three variables: 
x = L(u,v,w) = au + bv + cw. 

We have already seen that dimkerZ is at least n — p = 2. If a = b = c = 0. then every 
point satisfies the kernel equation au + bv + cw = 0, and dimkerZ = 3. Otherwise, 
at least one of the coefficients is different from zero. Suppose c ^ 0; then we can 
solve the kernel equation for w in terms of u and v: 

a b 

w = - u - v = M(m,v). 


The linear function M : R 2 —> R 1 that expresses w in terms of u and v is implicitly 
defined by the equation L(u,v,w) = 0. In fact, the graph of M is the locus of the 


Kernel L has positive 
dimension when n > p 


Example: one equation 
in three variables 



u — 2v — 3w == 0 


Implicit functions 


kerZ, = graph M 
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2 Geometry of Linear Maps 


Graphs in general 


Xx Y 



X 


Example: p equations 
in p + k variables 


Solving the equations 


0=A\ 

-<»<) 

= Bu + Cw 


equation L(u,v,w) = 0. The dimension of a graph is equal to the dimension of its 
source (see below); therefore dimkerZ = dim graph M = 2. 

Because we have found that the kernel of one linear map can be the graph of 
another, we pause to state some facts about graphs generally. 

Definition 2.6 The graph of an arbitrary map f :X —> Y is the subset of the product 
X x Y that is defined by 


graph / = {(*,/(*)) \xeX}. 

The definition makes it clear that there is a 1-1 correspondence between the source 
X and graph/. If the map is linear, we can say more. 

Theorem 2.17. If L : K' 1 -► W is linear, then graph/, is a linear subspace of the 
product I” x = R" +/) and dim graph L = n. 

Proof. Do this as an exercise. □ 

Now consider a general linear map L : R" —> R p : v i—> x with n > p. To begin, 
assume that L has maximal rank, so r=p and k = dimkerZ = n — p. If A = (ay) is 
the p x n matrix representing L, then the vector kernel equation Z(v) = 0 translates 
into a system of p ordinary equations in the n = p + k unknowns v = (vi, V 2 , ■ ■ ■, v„) 
and the coefficients ay: 


aim +ai 2 v 2 H-hai„v M = 0, 

a 2 ivi+a 2 2V2H-ha 2 „v„ = 0 , 


a p \vi + a p2 v 2 H-h a pn v n = 0. 

Our previous example suggests that we should be able to solve these equations so 
as to express p of the unknowns as linear functions of the remaining k = n — p. 

To solve the equations, note first that the rank of Z is the number of linearly 
independent columns of A. By our assumption, A must have p linearly independent 
columns. By rearranging them (and with them the variables vj), if necessary, we 
can assume that the final p columns of A are linearly independent. They form an 
invertible px p submatrix, C. The initial n — p = k columns form a pxk submatrix 
B. With these identifications, the kernel equations take the form 



(°) 


/ail ■ 

' a\k a l,k+\ ■ 


0 

= 

«2l ■ 

■ aik «2,/t+l • 

^2 ,k+p 

W 


\ a P l • 

a p k a p y + 1 

a p,k+pJ 
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or just B u + Cw = 0 in terms of matrices. Because C is invertible, we can solve The implicit functions 
for w: 


w = —C~ l Bn =M( u), 

wi = Pnui H- \~PikUk, 

W2 = &2\U\-\ - b Plk u k- 


W P — Ppl u l d-b PpkUli- 


This is what we want: the equation w = M( u) expresses p of the variables as linear 
functions of the remaining k. The p x k matrix — C~ l B = (fiij) that represents M is 

constructed from certain submatrices of the matrix A that represents L. Finally, the kerl = graph M 

argument shows that the graph of M is precisely the kernel of L, because 

w = M(u) if and only if Z(u,w) =5u + Cw = 0. 


That is, a point (u, w) is in the graph of M (so w = M{ u)) if and only if it is in the 
kernel of L. 

Notice, incidentally, how our general result echoes what we found in the first 
example, in which L(u, v, w) = au + bv + cw. We had A = (a b c) (a 1 x 3 matrix), 
B = (a b), and C = (c), implying 



The following theorem summarizes our result in both an algebraic form involving 
equations, and a geometric form involving maps. The condition that the original 
equations are linearly independent implies that the rank of the associated linear map 
is p. 


Theorem 2.18. Algebraically: A set of p linearly independent linear equations in 
k + p variables implicitly defines p of the variables as linear functions of the re¬ 
maining k variables. Geometrically: The kernel of a linear map L : —> R.T of 

maximal rank p is the graph of another linear map M : R 4 —>■ R p . □ 


One such map L : R^+T —> RT of maximal rank p is just the identity on the last p 
variables: 

wi =v*+i =x u 
W2 = Vk+2=X2, 

\Opxk Ipxp) 



Wp = Vk+p = X p , 

The p x k zero matrix O pX k eliminates the u variables u\ = vi, ..., m* = v* from 
the formulas. Geometrically, L projects R i+ T = R /f x R p onto its second factor. It 
projects the first factor R /l (the kernel of L) to 0, and it projects the parallel translate 
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2 Geometry of Linear Maps 


Every onto map 
is a projection 


of R A by the vector (u,w) (for an arbitrary u) to the point w = x in the target. 
Although this example may seem special, the following theorem shows that, in a 
sense, it is the only possibility. Because the theorem is geometric, it is helpful to 
write the linear map in its original form L : R" —> R p , keeping in mind it is an onto 
map with n> p. 

Theorem 2.19. If the linear map L : R” —> R p is onto, then there is a linear coor¬ 
dinate change H in the source R” that transforms L into the projection that is the 
identity on the final p variables. 

Proof. We assume, as in the proof of Theorem 2.18, that variables in the source 
have been permuted so that L has the form 

L=(B pxk C pxp ), x = Z.(u, w) = (B C) Q =Bu + Cw, 

where k = n— p> 0 and the square submatrix C is invertible. Define H : R" —> R" : 
(u,w) i—> (u,w) as the pullback 

= H=( °‘ X 'V 

\w= -C-'Su + C V V-^W C p*pJ 

Now H is invertible, 



so H is a valid coordinate change. Applying II to x = /iu + Cw gives 
x = Bu + C [ — 1 w] =5u-5u + w = W 

(i.e., x = w). The matrix for L in the new coordinates is 

r=iff=(sc)(_ c '. lgc °) = (o/). 

Thus L is the identity on the second component of (u, w); it projects (u, w) to w. □ 

Corollary 2.20 If the linear map L : R' ! —> is onto, and Y is a linear subspace 

of the target of dimension q, then its preimage 

ZA 1 (7) = {v in R" : L(y) is in y} 

is a linear subspace of dimension q + k, where k = n — p> 0. 

Proof. The theorem provides new coordinates (u, w) in R A ' xR p = R" in which L 
becomes a projection onto the second factor. Then 


L-\Y) = {(u, w) : w is in Y} = R k x Y, 
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because u is an arbitrary point in M /l . Hence dimZ 1 (7) = k + dim 7. Standard ar¬ 
guments in linear algebra show that Zr 1 (7) is a linear subspace. □ 

It is intuitively clear that, if L projects a larger space onto a smaller one, the 
pullback Lr x (7) will always be larger than the original 7. The corollary says that 
the difference is equal to the difference in dimension of the spaces themselves: 

dimZT 1 (7) - dim7 = k= dimM" - dimM p . 

But this means 

dimM" - dimZT 1 (7) = dimM p - dim7. 

Definition 2.7 The codimension of a linear subspace Y ofa vector space V is 

codim 7 = dimV — dim 7. 

Think of codimension as “dimension of the complement” or “complement’s dimen¬ 
sion.” A complement of 7 in V is a linear subspace Z for which V = 7 x Z. Obvi¬ 
ously, dimZ = dim V — dim7 = codim7. In the previous corollary, it is thus more 
useful to work with the codimension of a linear subspace than its dimension, because 
pullback alters dimension but preserves codimension. 

Corollary 2.21 If the linear map L : M" —> M p is onto, and Y is a linear subspace 
ofW, then codimZ -1 (7) = codim7. □ 

Theorem 2.19 provides a classification of onto linear maps that is similar to the 
classification of linear maps of the plane by Theorem 2.6 (p. 40). There we found 
several classes, each with a typical representative (that shares eigenvalues with all 
members of the class). Here there is only a single class, and projection is chosen as 
the typical representative of that class. 

It remains to consider the kernel of L : M" —> M p when the rank of L is no longer 
maximal. In this case it turns out not to matter that n> p. We can illustrate this with 
a simple example. 


u — w = 0, 
v — 2w = 0, 
u + v — 3w = 0, 
u — v + w = 0 . 

These four equations in three variables describe the kernel of a particular linear map 
L : M 3 —> M 4 . The maximum possible rank of L is 3. However, the actual rank is 
only r = 2: only two of the four equations are linearly independent; the other two 
are linear combinations of those. By the rank-nullity theorem, the dimension of the 
kernel of Lis k=n — r = 1. We expect, therefore, that the kernel of L is the graph 
of some other linear map M : M 1 —> R 2 . Indeed, the kernel equations imply 


Codimension 


Classifying linear maps 
that are onto 


Assume rank is 
not maximal 
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u = w, 
v = 2 w. 


2 Geometry of Linear Maps 


M: 


This is the linear map we seek. The following theorem generalizes this example; it 
makes no assumption about the relative sizes of n and p. 


ker L is a graph Theorem 2.22 (Linear implicit function theorem). IfL : R" —> R p has rank r < n, 

when r <n then kerZ in R” is the graph of a linear map M : —> R r . 


Proof. We consider cases; in every case, r < p. If r = p, Theorem 2.18 applies. If 
r = 0, then L is identically zero and kcrL = R”. Therefore, the “zero” linear map 
M : R" —> R° : v 0 has graph M = kerZ. 

The only remaining possibility is 0 < r < p. Then q = p — r > Oof the equations 
that determine kerZ depend linearly on the remaining r equations. Select q depen¬ 
dent equations and discard them. Then use the remaining p — q = r> 0 equations to 
define a new linear map L* : R” —» R'\ Because the discarded kernel equations add 
no information, kerZ* = ker/.. 

By construction, L* does have maximal rank r, so Theorem 2.18 applies again: 
the kernel equations for Z* implicitly define r of the variables v i..... v„ as linear 
functions of the remaining n — r variables. In other words, ker L* = kerZ is the graph 
of a linear map M : R" - '' —> R'k □ 



One linear map L : (u \,..., m*, w\ ,..., w r ) 
given by 

xi = 0 , 



L: 



y i = wi, 


L = 


^ {x \, • • • ,x q ,yi,... ,y r ) of rank r is 


( Oqxk 
Orxk 



y r = W r , 


To make this different from the previous example (p. 49), it is sufficient to require 
q > 0. The kernel of L consists of the points (u,0), and the image of L consists of 
the points (0,y). If we write the source as R A x R r and the target as R 9 x R r , then L 
is the projection that is the identity on the second components: 


L : (u,w) i—> (0,w). 


The next theorem shows that this example is essentially the only possibility. 

Theorem 2.23. Let L : R" —> R p be a linear map of rank r, and let k = n — r, q = 
p — r. Then there are coordinates in the source and target for which the matrix 
representing L is 

Oqxk Oqxr\ 

O r xk lrxr ) 


n = 
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Proof. We obtain coordinates for the source and for the target in which the map L is 
represented by multiplication by the given matrix 77. According to the rank-nullity 
theorem, k = n — r is the dimension of the kernel of L. Let {U\..... 74} be any basis 
for the kernel; add additional vectors W \,..., W r so that 

{U x ,...,U k ,W x ,...,W r } 

is a basis for the entire source Then any vector v in M. k+R can be written as 

v = Mi U\-\ -1- u k U k + w\ W\ -1 -w r W r , 

and (inasmuch as every 7,(74) = 0) 


L(v)=WiL(W 1 ) + --- + w r L(tV r ). 


Because v is an arbitrary vector in the source, the vectors L(v) constitute the entire 
image of L, and the equation for L(y) shows that the vectors L(Wj) span the image. 
Because the image has dimension r, the vectors Yj = L(W/), j = 1 must, in 
fact, form a basis for the image. Add additional vectors X\ ,..., X q so that 

{X h ...,X qi Y h ... 1 Y r } 

is a basis for the entire target R ?+r . Then, in terms of these two bases, the coordinates 
of v and L(v) are 


(ui,...,u k ,wi,...,w r ) = (u,w), 
Z(v) <-> (0,...,0,wi,...,wy) = (0,w). 

Multiplication by 77 gives 



thus 77 does indeed represent L in terms of these coordinates. □ 


We now switch our attention from kernels to images. We show that when imL 
is a proper subspace of the target of L, it too can be thought of as the graph of a 
linear map implicitly defined by L. As we did for kernels, we assume first that L has 
maximal rank. 

We begin with an example. Consider the linear map L : M 2 —> R 3 : (u, v) i—> (x,y,z) 
given by 


When is imZ, a graph? 


Example: 
three equations 
in two variables 


au + bv = x, 
cu + dv = y, 
eu + fv = z. 
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2 Geometry of Linear Maps 


The implicit function 


Example: n+q 
equations in n variables 


The implicit functions 


If L has maximum rank, namely 2, then precisely two of the equations are linearly 
independent. By rearranging them, if necessary, we may assume the first two equa¬ 
tions are. Then we can solve these two equations for u and v in terms of x and y. It 
is perhaps easiest to see this if we work with matrices: 

:i) (:)=©■ 


Here D = ad — be, the determinant; linear independence of the first two equations 
implies Dfi 0. We can now express z directly in terms of x andy: 


dx—by —cx + ay 

z = e — - Vf- 


de — cf af—be 


D 


D 


D 


D 


■y = M(x,y). 


In geometric terms, the image of L is a plane in the target, R 3 . We have shown 
that if (. x,y,z ) are the coordinates of a point in this plane, then z is not independent 
of x and y, but depends linearly on them. The points of the image plane are of the 
form {x,y,M{x,y))\ in other words, the image of L is the graph of M. 

To generalize our example, take a linear map L : M" —>R" +9 , q>0, with maximal 
rank r = n. Let A = (a,y) be the matrix representing L. Assume, by rearranging the 
rows of A, if neccesary, that the first n rows are linearly independent. Then, if we 
write the equation L{y) = Ax = x in the form, 



C 


^n+1,1 


Cl\n 


®nn 

®n+\ : n 



\@n+q,l 


®n+q,n/ 


< yi N ' 

y« 

z n +1 
\ z n+q ) 


y 


z 




we can express this in terms of the submatrices B and C as a pair of matrix equations: 


B\ = y, Cv = z. 

We rearranged the rows of A to guarantee that the nxn submatrix B is invertible. 
Hence we can solve for v in terms of y: v = B *y and we can then express z in terms 
of y: 

z = CB- x y. 

In terms of the components y, z of x, the equation z = CB~ l y takes the form 


z n +1 = tilth -byi «y«, 

z n+2 = 72 OT H - \~Y2nyn, 


z n+q — Yqiyi “f * *' d“ YqnYn • 
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In this way we have expressed the image of L as a system of q linear equations in 
n variables. The coefficients Yij in these equations are the components of the q x n 
matrix CB~ l that represents a certain linear map M : R” —> RL We see that a point 
(y,z) is in the image of L if and only if z = M{ y), that is, if and only if it is of the 
form (y,M(y)). These are precisely the points of the graph of M. 

Theorem 2.24. Suppose the linear map L : R" —► R" +l? has maximal rank n. Then 
the image of L in R n+? is the graph of a linear map M : R" —> RL □ 

In these circumstances, the image L(R”) is ^-dimensional. Therefore, if P is a 
parallelpiped in the source and L(P) is its image, both are ^-dimensional, and both 
have «-volume. It is natural, then, to ask what is the volume multiplier for L. Of 
course, because the target dimension is larger than n, the image parallelepiped L(P) 
cannot be oriented and the sign of its ^-volume will have no meaning. Thus, we 
always understand the volume multiplier for L to be nonnegative. Let us consider 
first the special case L : R 2 —> R 3 when the image is a 2-dimensional plane. 

Theorem 2.25. Suppose L : R 2 —> R 3 has maximal rank 2 and is represented by the 
3x2 matrix (aif), i = 1,2,3, j = 1,2. Then 


area multiplier of L 


«21 «22 
031 032 


011 012 
031 032 


011 012 
021 022 


Proof. By linearity, the area multiplier will equal the area of the image of a paral¬ 
lelogram of unit area. In particular, take P to be the unit square 


P = 



then L(P) 




It follows from page 44 that the area of the parallelogram L(P) is 


021 022 

031 032 


+ 


011 012 
031 032 


011 012 
021 022 


□ 


Volume multiplier 


( an «12\ 

021 «22 I 

031 032/ 


According to Exercise 2.26.C. (p. 64), the area of the same parallelogram UP) 
can also be written as Vdet LfL. More generally, then, Exercise 2.36 and the discus¬ 
sion leading up to it establish the following (“Pythagorean”) theorem. 

Theorem 2.26. Suppose L : R” —> R" +? has maximal rank n; then the n-volume 
multiplier for L is V det UL. □ 

Splitting the target 
of a 1-1 map 


Because the rank of L equals the dimension of the source, the kernel of L is zero 
so L is a 1-1 map. The next theorem says we can split the target into two factors 
so that L is just the identity mapping onto the first factor. This is a special case of 
Theorem 2.23, but it is worth having another proof that follows different lines. 
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IT 



v 


Injections 


Assume rank is 
not maximal 


Theorem 2.27. Suppose L : M" —> R" +? is 1—1; then a coordinate change H in the 
target transforms L into the matrix 


L = 



Proof. To begin, we assume (as in the proof of Theorem 2.24) that L has the matrix 
form 



that is, 


Bv = y, 
Cv = z, 


with B an n x n invertible matrix. Define H : R" +9 —> R” +? by 


H: 


y = B~ 1 y, 

z = — C£ -1 y + z, 



Because det H = det(/f 1 ) / 0, // is a valid coordinate change. Moreover, 

t=hl=( B ~ l °\( B \ = ( I \ y=^ 1 ( fiv ) = v ’ 

\~CB-' I) VcJ z = —CB~ l (B\) + C\ = 0. 


□ 


By analogy with projections, a linear map with the special form of L is sometimes 
called an injection. The theorem thus says that every 1-1 linear map is (equivalent 
to) an injection. The following corollary stands in the same relation to Theorem 2.27 
that Corollaries 2.20 and 2.21 do to Theorem 2.19. It says that a 1-1 linear map 
preserves dimension under push-forward. 

Corollary 2.28 If the linear map L : R" —> R p is 1—1, and X is a linear subspace of 
dimension k in the source, then L(X) is a linear subspace of the same dimension k 
in the target. 

Proof. In terms of the (y,z) coordinates in the proof of Theorem 2.27, the linear 
map L becomes the injection (y,z) = Z(v) = (v,0). Thus, for any subspace X, we 
have L(X ) = X x 0, implying dimZ(X) = dim L(X) = dimX. □ 

The final possibility to consider is the image of a linear map L : R” —> R p whose 
rank may not be maximal. In this case we need make no assumption about the rela¬ 
tive sizes of n and p. 

Theorem 2.29. If L : R" —► R p has rank r < p, then im L in R p is the graph of a 
linear map M : R r —> 

Proof. As a preliminary step, write L : R" —> R' -+? , where q = p — r and q > 0 
by hypothesis. The rest of the proof now proceeds by analogy with the proof of 
Theorem 2.22. 
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In every case, r < n. If r = n, the previous theorem applies. If r — 0, then L 
is identically 0 and imZ, = R°. In this case, im L is the graph of the linear map 


M:M° ^RP : 0 i—> (0,0,... ,0). 


The only remaining possibility is 0 < r < n. Let A be the pxn matrix that repre¬ 
sents L. The n columns of A are elements ofR p that span imL in R7 1 . Our assumption 
implies that only r of the n columns are linearly independent; the remaining n — r> 0 
columns depend linearly upon these. Delete the dependent columns from A to create 
a new matrix A* of size p x r = (r + q) x r, and let L* : R r —> R r+q be the linear 
map defined by A*. Because the columns of A* and A span the same r-dimcnsional 
linear subspace of R p = R'" -1 " 9 , we have imL* = im L. 

By construction, L* has maximal rank r, so Theorem 2.24 applies: im L* = im L 
is the graph of a linear map M : R'" —> R 9 = W~ r . □ 


Exercises 

2.1. Show that the linear map M\ (p. 29) alters the slope of any line that is neither 
horizontal nor vertical. Specifically, show that if a line has slope Av/Au = m, 
its image has slope Ay/Ax = 3m/10 / m if m / 0,°°. 

2.2. Determine the coordinate change from (u, v) to (u,v) (and from (x.y) to (x,y)) 
that converts the matrix A /5 into A /5 (cf. p. 34)). 

2.3. Let A, B, and C be n x n matrices. Suppose C is equivalent to B (cf. Defini¬ 
tion 2.1, p. 33) and B is equivalent to A\ show that C is equivalent to A. 

2.4. This question concerns the map M : R 2 —*• R 2 : (u, v) 1 —> (x,y) defined by the 


matrix 



a. Determine the area multiplier for M. 

b. Sketch in the (x,y) -plane the image of the standard unit grid from the 
(m,v)- plane. 

c. Show that the image of the line in the direction of the vector ( u , v) = (2,1) 
is the line in the direction of the vector (x,y) = ( 2 , 1 ) (in the (x,y)-plane). 
In other words, show that this line is invariant under M. 

d. Show that the same is true for the line in the direction of the vector (u, v) = 


(- 1 , 2 ). 


e. Sketch in the (x,y)-plane the image of the solid gray grid shown in the 
(w,v)-plane. (Notice that the lines in this grid are parallel to the lines from 
parts (c) and (d).) 

f. What is the shape of the image of a single square from the new grid? What 
are the dimensions of that shape if you use that solid gray grid to define 
new unit lengths? What is the area of that shape in terms of these new 
units? What is the area multiplier in terms of these new units? 
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2.5. Determine the eigenvalues and eigenvectors of each of the following matri¬ 
ces/maps. 


a. M = 


1 2 
2 -2 


b. M = 



c. M = 


( 0 V6\ 

U-i ) 


2.6. Carry out an analysis similar to what you did in Exercise 2.4 for the linear 
map defined by each of the the following matrices. 


a. M = 


1 2 
2 -2 


b. M = 



In particular, construct a grid whose image is “parallel to itself.” Note that, 
in the second case, the grid consists of parallelograms rather than rectangles 
(or squares). Determine the linear multipliers for the map and show that the 
sides of the grid are stretched by these factors. Determine the area multiplier 
for the map and indicate how your diagram confirms that value. Comment on 
how the map affects orientation. 


2.7. Consider the following from Example 8 (p. 38): 


Mg 


4 -2 
2 0 



v = y = 



a. You know Mg has the repeated eigenvalue 2, and that therefore the kernel 
of the matrix Mg — 21 contains all the eigenvectors of Mg. Show that the 
dimension of the image of Mg — 27 is 1; by the rank-nullity theorem, the 
dimension of the set of eigenvectors of Mg is only 1, not 2. 

b. Show that Mg (u) = 2x, Mg (v) = 2x4- 2y. In particular, identify an eigen¬ 
vector of Mg. 

c. Let (u. v) be coordinates based on u and v, and let (x,y) be coordinates 
based, in the same way, on x and y. Show that the coordinate change from 
the standard basis to the new one is 


u = u + v, 
v = u, 


x = x+y, 
y = x. 


d. Show that, in terms of the new coordinates, Mg becomes 


Mg : 


x = 2 m + 2v, 

y = 2v, 



2.8. a. 


Show that the complex numbers e + ' e = cos 0 i i sin (9 
of the matrix 


Re 


I cos 0 — sin 0 
ysinO cos 0 


are the eigenvalues 


b. Show that Rg rotates the plane by 0 radians by showing that 
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f cos 9 

- sin (A 

f rcosa\ 

\ sin0 

COS0 ) 

V rsina / 


/ rcos(a + 0)\ 
\ v rsin(a + 0)J ' 


In other words, Rg maps the point with polar coordinates (r. a) to the point 
with polar coordinates (r,a + 9). Explain why this implies Rg has no real 
eigenvectors when 9 ^ nn, n integer. 

c. Find a (complex) eigenvector for each of the eigenvalues of Rg, 9 ^ nn. 

2.9. Show that the only matrix equivalent (cf. Definition 2.1, p. 33) to the uniform 
dilation D = A/ is D itself. 

2.10. Let arctan(y/x), viewed as a function of two variables, be defined in terms of 
the usual arctangent function for all (x,y) ^ (0,0) as follows: 


arctan (y/x) 


arctan (y/x), 
7T/2, 

< -n/2, 
arctan (y/x) — n, 
arctan (y/x) + n, 


0 < x, 

x = 0, 0 <y, 
x = 0, y < 0, 
x < 0, y < 0, 
x < 0, 0 <y. 


a. Show that arctan(y/x) is continuous across they-axis, and is thus continu¬ 
ous onR 2 \{(x,0)|x < 0}; this is the plane with the origin and the negative 
x-axis deleted. 

b. The graph z = arctan (y/x) is a spiral ramp; sketch it. 


2.11. Suppose the 2 x 2 matrix M has real unequal eigenvalues fa and A 2 , with 
corresponding eigenvectors Ui and U 2 . 


a. Explain why ui and U 2 are linearly independent. 

b. Let G be the matrix whose columns are ui and U 2 , in that order. Explain 
why G is invertible, and then show 


G~ 1 MG = 



2.12. Suppose the 2 x 2 matrix M has the repeated eigenvalue A ^ 0, but has only 
a single eigendirection, along the eigenvector u. The purpose of this exercise 
is to show that M is equivalent to the standard shear-dilation matrix 


S X = 



a. Let ei and e 2 be the standard basis vectors. Show by direct computation 
that the vectors (M — A/)ei and (M— A/)e 2 are both eigenvectors of M. 
Conclude that (M— A/)w is an eigenvector of M for every w in M 2 . Sug¬ 
gestion: The vectors (M— A/)ei and (M — A/)e 2 are the columns of the 
matrix 
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a)- 

b. The image of M— A/ is 1-dimensional and contains Au; why? Conclude 
that there is a vector v for which (M— A/)v = Au. Explain why u and v 
are linearly independent. 

c. Let G be the matrix whose columns are u and v. Explain why G is invert¬ 
ible, and show that G~ l MG = S - A . 

2.13. Suppose the real 2x2 matrix M has complex eigenvalues a ± hi, b ^ 0, and 
the real vectors u and v form the complex eigenvector u + iv for M with 
eigenvalue a — hi (note the difference in signs). The purpose of this exercise 
is to show that M is equivalent to the standard rotation-dilation matrix C ab 
(cf. p. 39). 

a. Show that the following real matrix equations are true: 

Mu = au + b\ 1 M\=—bu + a\. 

b. Let G be the matrix whose columns are u and v, in that order. Show that 
MG = GC a . b . 

c. Show that the real vectors u and v are linearly independent in M 2 . Sugges¬ 
tion: first show u^0,v^0. Then suppose there are real numbers r, s for 
which ru + sv = 0. Show that 0 = M(ru + sv) implies that su + rv = 0, 
and hence that r = s = 0. 

d. Conclude that G is invertible and G~ l MG = C a j,. 

2.14. Notice that, in Exercise 2.6, the map whose invariant grid was rectangular 
(and hence whose eigenvectors were orthogonal) was the one whose matrix 
was symmetric. A matrix is symmetric if it is equal to its own transpose; the 
2x2 symmetric matrices are 


M = 



a, b, and c real. 


The purpose of this exercise is to show that a symmetric matrix always has 
linearly independent orthogonal eigenvectors that define an invariant rectan¬ 
gular grid. 

a. Show that the eigenvalues of a 2 x 2 symmetric matrix M are real. (Sugges¬ 
tion: Look at the formula for the roots of the quadratic equation that gives 
the eigenvalues and focus on the part of the formula that leads to complex 
values.) 

b. Suppose a 2 x 2 symmetric matrix M has unequal eigenvalues. Show that 
the eigenvectors that correspond to these eigenvalues are orthogonal (e.g., 
their dot product is zero). 
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c. Suppose a 2 x 2 symmetric matrix M has equal eigenvalues. Show that this 
can happen only if b = 0 and a = c. (Again, look at the formula for the 
roots.) This implies M must reduce to a multiple of the identity matrix. 

In the last case (c), every nonzero vector is an eigenvector. Therefore, for 
every symmetric 2x2 matrix M, R 2 has a basis consisting of orthogonal 
eigenvectors of M. Thus if a linear map M : M 2 —> R 2 is represented by a 
symmetric matrix, there is a grid of squares that is stretched into a parallel 
grid of rectangles (as happened in Exercises 2.4 and 2.6a). 

2.15. The purpose of this exercise is to show that the area of v A w is the determinant 
of the matrix whose columns are the coordinates of v and w. 

a. Let 9 be the angle between v and w, taken from v to w and chosen so 
that -n <6 <n. Thus, 9 is negative precisely when vA w has negative 
orientation. Show that area(vAw) = ||v||||w|| sin0. (Notice that this has a 
negative value when vAw has negative orientation.) 

b. Show that area 2 (v A w) = ||v|| 2 |jw || 2 — (v-w) 2 . 

c. Let v = (vi,V 2 ), w = (wi, W 2 ). Show that 

IMI 2 II W I! 2 - ( v • w ) 2 = (y\ w i - V 2 W 1 ) 2 , 

and hence that area(vA w) = ±(vi wi — V 2 W 1 ). 

d. Show that requiring the area of the unit square (1,0) A (0,1) be positive 
implies we should choose the positive root in part (c): 

, . n , Yvitvi 

areal v A w) = V 1 W 2 — vrw 1 = det 

V v 2 w 2 


2.16. Suppose M = 





a. Let V be the matrix whose columns are v and w, and let V be the matrix 
whose columns are M(y) and M( w). Show that V = MV and hence that 
det V = detM x det V. 

b. Conclude that 


area M{\ Aw) = area M(\) A M{ w) = det M x area(v A w). 

2.17. The aim of this exercise is to show that the determinant of a 2 x 2 matrix can 
be viewed as a certain function D( v, w) of its columns v and w that is uniquely 
characterized by the following three properties. 

• Z)(e 1 , e 2 ) = 1, where e, is the zth column of the identity matrix. 

• D(v, w) = 0 if v = w. 

• D(v. w) is a linear function of each of its arguments v and w. That is, 


v 



Defining properties 
of the determinant 
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The determinant 
in higher dimensions 


D(tv, w) = tD(y, w), 

D(y i +V2,w) =D(vi,w)+D(v2,w), 

and similarly for the second argument. We say D is bilinear. 

a. Show that D is antisymmetric; that is, D( y,x) = —D(x,y) for all x and y. 
Suggestion: First expand D(x + y,x + y) to four terms using the bilinearity 
of D (this is a kind of “FOIL”). The second property will guarantee two of 
those terms are zero; the remaining two terms then give the result. 

b. Show that Z3(e2,ei) =—1. 

c. Let v = viei +V 2 e 2 and w = w\t\ +W 2 e 2 . Using the bilinearity of D 
(“foil” again!), show that 

D(\, w) = viW2D(ei,e2) + V2WiD(e2,ei) = V1W2 — V2W1, 


proving that D(\, w) = det ( V1 M1 ). 

\v 2 w 2 J 

The reason we have paused here to characterize the determinant of a 2 x 2 matrix by 
these three properties is that we later use (suitable modifications of) the same three 
properties to define the determinant of an n x n matrix. (See the exercises below). 
The properties of D(y, w) are the properties of area(vA w); thus the determinant 
of an n x n matrix will be connected with the volume of an //-dimensional paral¬ 
lelepiped, the analogue of a 2-dimensional parallelogram. 

2.18. Determine (5,2,-1) x (3,4,2) and (1,1,1) x (1,1,-1). 

2.19. Determine the volume of the parallelepiped; 

a. (5,2, — 1) A (3,4,2) A (1,0, — 1). 

b. (1,1,1)A(1,1, — 1)A(1, — 1, — 1). 

2.20. Consider the linear map M : R 3 —> K 3 and the parallelepiped P defined by 


M = 




a. What is the orientation of PI What is the volume of PI 

b. Describe the parallelepiped M(P) by listing its edges, in proper order. 

c. Determine directly from your answer in (b) the orientation and volume of 
M(P). 

d. What is the volume multiplier of Ml Does this value account for the ori¬ 
entation and volume of M(P) you found in part (c)? 


2.21. Suppose z = xxy/0. Show that the linear map L : M 3 —> R 3 defined by 
L{\) = y, L(y) = x, L(z) = -z, 
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has positive determinant and maps x A y to y A x. 

2.22. Show that vol (xAyAz) = vol(x AyAz), where z = z+ ax + /3y and a and/3 
are arbitrary. One way to do this is to note that the two parallelepipeds xAyAz 
and xAyAz have the same base x A y, and their third edges z and z lie the 
same distance above that base. Draw a picture. 

2.23. The parallelpipeds obtained from xAyAz by an arbitrary permutation of x, 
y, and z are all equal as sets. However, they differ in orientation. Show that 
the cyclic permutations y A z A x and z A x A y have the same orientation as 
xAyAz, but the permutations that simply transpose a pair of edges, namely 
yAxAz, xAzAy, and zAyAx, all have the opposite orientation. We can 
express this in the following way. 

yAzAx = zAxAy = xAyAz, 

xAzAy = yAxAz = zAyAx = - xAyAz. 

2.24. The purpose of this exercise is to show that the determinant of a 3 x 3 matrix 
is a certain function D of its columns that is uniquely defined by the following 
three properties. This is exactly analogous to the 2 x 2 case as dealt with in 
Exercise 2.17 (p. 61), and is preparation for the n x n case addressed below 
(Exercise 2.28). 

• D(ei,e 2 ,e 3 ) = 1, where e, is the /th column of the identity matrix. 

• D(x, y, z) = 0 if any two of the columns x, y, z are equal. 

• D(x, y, z) is a linear function of each of its arguments x, y, and z. That is, 

D(tx,y,z) = tD(\,y,z), 

D(x i +x 2 ,y,z) =D(x 1 ,y,z)+D(x 2 ,y,z), 

and similarly for the second and third arguments.) We say D is multilinear. 

a. Show that D is antisymmetric; that is, D changes sign when any two 
columns are interchanged: 

D{x,z,y) =D(z,y,x) =D(y,x,z) = -D(x,y,z). 

(Compare this with the previous exercise.) 

b. Show that 


•D(ei,e 3 ,e 2 ) =D(e 3 ,e 2 ,ei) =D(e 2 ,e 1; e 3 ) = -1, 
^>(e 2 ,e 3 ,ei) =D(e 3 ,ei,e 2 ) =D(ei,e 2 ,e 3 ) = +l. 


c. Write x, y, z in terms of their coordinates with respect to the standard basis: 



= X\ 



+ x 2 



+ x 3 



X = 


= xiei +x 2 e 2 +x 3 e 3 ; 
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similarly, y =y\£\ +^ 2 ^ 2 +^ 363 ; z = z\e\ +zi ^2 + z 3 ^ 3 - Using the mul¬ 
tilinearity of D , expand Z)(x, y,z) as a sum of 27 terms of the form 
XiyjZ/ i D(ei,ej,ei c ). Note that, in a given one of these expressions, the in¬ 
dices i, j, k need not be distinct. 

d. Precisely 21 of the 27 terms you just obtained are automatically zero. 
Which ones, and why? 

e. Show that the remaining six terms yield 



the familiar 3x3 determinant. Hence, the 3x3 determinant is uniquely 
determined by the three properties defining D. 

2.25. Show that D(\, y,z) = £>(x,y,z), where z = z + ax + j 3 y and a and /3 are 
arbitrary. This is obviously the same result as Exercise 2.22 above; however, 
you should prove it here using only the properties of D defined and deduced 
in the previous exercise. 

2.26. LetP = vAwbetheparallelograminR 3 : ( 111 , 112 , 113 ) spanned by v= (1,1,2) 
andw = (1,0, —1). 

a. For i = 1,2,3, describe in the form P, = v, A w, the projection of P onto 
the coordinate plane w,- = 0. For clarity, write v,- and w,- as elements of R 2 
rather than R 3 . 

b. Determine the areas of the three projections Pj ; then use the “Pythagorean” 
theorem to calculate 


areaP = y/ area 2 Pi + area 2 P 2 + area 2 P 3 . 


c. Let V be the 3x2 matrix whose columns are the components of v and w, 
in that order, and let F f be its transpose. Show that 



implying detF^T = IM| 2 ||w || 2 - (v- w) 2 . Show that this is area 2 /* (cf. Ex¬ 
ercise 2.15, p. 61). Confirm that this value of area P agrees with the value 
you found in part (b). 

2.27. Show that (x-y ) 2 < ||x|| 2 ||y || 2 for all vectors x and y in R”. (Suggestion: both 
sides are zero if y = 0 , so assume y ^ 0 and let 



Now consider the implications of 0 < z • z.) 
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The next exercise defines the determinant of an n x n matrix V as a function D of its 
columns that satisfies certain properties. We have already seen how this approach 
works with a 2 x 2 matrix (Exercise 2.17, p. 61) and with a 3 x 3 matrix (Exer¬ 
cise 2.24). 

As we saw in the earlier exercises, and see again in the n x n case, an essential 
property of D is antisymmetry. A description of this property involves rearranging, 
or permuting, the columns. When there were only two or three columns, this was 
simple to follow. However, it is useful here to pause and introduce some facts about 
permutations of an arbitrary number, n, of objects. 

A permutation on « elements is an invertible map n of the set {1,2,...,«} to 
itself. A transposition is a permutation t,j that interchanges the elements i and j 
and leaves all the others unchanged: T; j(z) = /; T; j(/) = z; T; j (k) = k, k ^ i, j. The 
product of two permutations is the permutation that results from their composition: 
( n\ ■ 7T2)(f) = (tti 7T2)(*) = 7Tl (^(z)). The identity map is a permutation and it is the 
identity element in the product. (With this product, the set S n of permutations on n 
elements is a group with n\ elements.) 

Every permutation can be written as a product of transpositions. The number of 
transpositions in such a product is not unique, but its parity is. Therefore, we say a 
permutation n is even, and write sgn n = + 1 , if n is always the product of an even 
number of transpositions; we say it is odd, and write sgnzr = — 1, if it is always the 
product of an odd number of transpositions. 

If Vi, \ 2 ,... v„, are the columns of the matrix V, thenD(vi, V 2 , ... v„) is a func¬ 
tion that satisfies the following conditions: 

• D(e 1 , e 2 ,..., e„) = 1, where e, is the z'th column of the n x n identity matrix. 

• D(\ i,V 2 , • •. ,v„) = 0 if any two of the columns v,- are equal. 

• D is multilinear; that is, D(\i,\ 2 , ■ ■■ ,v„) is a linear function of each of its 
arguments Vi, V 2 ,..., v„. 

Note: It is not yet evident that there is such a function D; the exercise shows that D 
exists. 

2.28. a. Show that D is antisymmetric; that is, D changes signs when any two 
columns are interchanged. In terms of permutations: if n is a transposition, 
then 

^K(l), V/r (2 ), • • ■, Vn(n)) = , V2, • • •, e„) ■ 

b. Suppose n is a permutation of {1,2,...,«}. Show that 

1) ? ®zr(2)) • • • 1 *~n(n )) ^ ^ 1 ■ 

c. Suppose that the map ;r:{1,2, 1,2,...,«} is not a permutation, 

that is, n is not onto or one-to-one. Explain why 


Permutations and 
transpositions 


Even and odd 
permutations 


The defining 
properties of D 


-^( e 7r(l)i e 7r(2)> • ■ ■ i®tt(b)) — 0- 
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d. First write each column v, as a linear combination of the standard basis 
elements e^: 

vi = vnei +V2ie2 H-b v„\e„, 

\2 = Vl2ei +V22C2 H-b V„ 2 e„, 

Vn = Vj w ei ~b V2n&2 "b * * * ~b 

Then, using the multilinearity of D, expand D(v\ , vx, ■ ■. ,v„) as a sum of 
n n terms of the form 

v tt(1),1 v tt(2),2 ''' v K(n) jz(\) 1 ^7i(2)i > e jr(«))- 

e. Most of the n n terms you just obtained in the previous part are automati¬ 
cally zero; why? Which ones are not? 

f. Conclude that 

£>(vi,V 2 ,...,V„) ^ (sgn7t) v n(l), 1 v it(2),2 '' ' v ji(n),rr 

ninS„ 

This formula for D shows that there is one and only one function D that 
satisfies the three given properties. 

This exercise gives us a way to define the determinant of an n x n matrix. 

Definition 2.8 Suppose V = (v,y) is the n x n matrix that has the element vij in the 
ith row and jth column. Let vy = (v,y), i = 1 denote the jth column ofV, so 
V = (vi,V2,...,v„). The determinant of V is 

detK=Z>(v 1 ,V 2 ,...,V B )= Yj ( s g n7I 0 V^(l),l v k{2),2 '' ■ v n(n),n- 

n in S n 

Thus, the determinant of V is the sum of all possible products of n elements of V, 
one taken from each row and each column, switching the sign of a particular product 
if the row indices represent an odd permutation of the columns indices. 

2.29. Write out the 24 terms of the determinant of a 4 x 4 matrix. 

2.30. Suppose that V = (v,y) and v,y = 0 if i > j. This is called an upper triangu¬ 
lar matrix, because all entries below the main diagonal are zero. Show that 
detF = viiv 2 2 ---v„„. 

2.31. Let A and B be 2 x 2 matrices, and let O denote the 2x2 matrix whose ele¬ 
ments are all 0. Find the determinant of each of the following 4x4 matrices 
in terms of det^4 and deti?. 
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2.32. Show that a square matrix ri with a row or a column of zeros has det A = 0. 

2.33. Show that A and A ' have the same determinant. 

2.34. The minor My of A is the (n — 1) x (n — 1) matrix obtained by deleting the z'th 
row and /th column of A. Show that we can “expand detri by minors along 
the z'th row” in the following way. 

detri = (— l) ,+ 1 a,i detAfn + (—l)' + 2 a, 2 detM ,2 4- 1 - (— 1 ) l+n a ln det M,„. 

Write the analogous formula to “expand by minors along the yth column.” 

The definition of a parallelogram in R 3 suggests we can define similar objects in 
R", n > 3; in fact, we can generalize Definition 2.4 to define a parallelepiped of any 
dimension k < n in R". 

The ^-dimensional parallelepiped v i A V 2 A ■ ■ ■ A vy i s the set of vectors 

k 

W =Y,ti y i’ 

i= 1 

where vi,..., v* are vectors in R" and 0 < U < 1, i = 1,..., k. We take k < zz, but 
if k ^ n, vi A ■ ■ ■ A v* is not oriented. We continue to call a 2-parallelepiped v A w a 
parallelogram. 

2.35. Let v A w be a parallelogram in R", and let V be the n x 2 matrix whose 
columns are the coordinates of the vectors v and w, in that order. Show that 
detV^V = ||v|| 2 ||w || 2 — (v-w) 2 . By Exercise 2.26, we can take this to be 
area 2 (v Aw). 

Now that we have the area of a parallelogram, we can define the k- volume of a k- 
parallelepiped inductively on k. We start with a 3-parallelepiped \\ A V 2 A V 3 . Think 
of this as having base vi AV 2 ; then we want the “3-volume” to be the area of the 
base times the perpendicular height: 

vol(vi AV 2 A V 3 ) = area(vi AV 2 ) ||hj|. 

Here h is the vector that is orthogonal to the base vi A V 2 and in the plane that 
contains V 3 and is parallel to the base. A vector in that parallel plane is of the form 

h = v 3 — fiqvi — a 2 v 2 , 


for some real numbers a\ and aj. The orthogonality condition on h gives us two 
equations, 


0 = vi • h = vi • V 3 — tzivi • vi — ZZ 2 V 1 • V 2 , 
0 = V2 • h = V2 • v 3 - «1V2 • Vi - fl 2 V2 • V2, 


^-parallelepipeds 


3-volume 



that we can convert into a matrix equation for the unknowns a\ and ay. 
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det V^V = vol 2 


/vi-vi Vi -v 2 \ /aA _ f\\ -v 3 \ 

[y 2 . Vl V2 • V 2 y V«2/ \V2 • V 3 y ' 

There are unique values for a\ and a 2 —that is, h is uniquely defined—precisely 
when the matrix on the left is invertible. But notice that the determinant of that 
matrix is, by Exercise 2.26, the square of the area of the base parallelogram vi A v 2 . 
If area(vi A v 2 ) = 0, the 3-volume of \\ A v 2 A v 3 is zero; otherwise, we can find h, 
as above, and obtain vol(vi A v 2 A v 3 ). In fact, the volume is a determinant: 


V 2 -Vi v 2 -v 2 

area 2 (vi A v 2 ) ||h|| 2 = vol 2 (vi A v 2 A v 3 ). 


vi-vi vi • v 2 0 
V 2 • Vi v 2 • v 2 0 

0 0 h h 


Now replace the zeros in the 3 x 3 determinant with v 3 ■ h and v 2 ■ h, as appropriate. 
Then substitute for the right-hand factor h in each entry in the third column its 
expression as a linear combination of v 3 , v 2 , and v 3 : 


Vi -Vl 

Vl ■ v 2 

vi h 


Vl ■ Vl 

Vl -v 2 

Vl • v 3 

-am - Vl 

— a 2 Vi• v 2 

V2-V1 

V 2 ■ v 2 

v 2 h 

= 

V 2 'Vl 

v 2 ■ v 2 

v 2 -v 3 

-a 1 V 2 -V! 

— a 2 v 2 ■ v 2 

h vi 

h • v 2 

h h 


h vi 

h v 2 

h v 3 

— aih ■ vi 

a 2 h v 2 





Vl - Vl 

Vl -V 2 

vrv 3 






= 

V 2 'Vl 

v 2 ■ v 2 

v 2 -v 3 







h vi 

h v 2 

h v 3 




To get the simpler expression in the last step, above, we have used the multilinearity 
of the determinant and the fact that the determinant of a matrix with two equal 
columns is zero; cf. Exercise 2.25. The net result is that h is replaced by v 3 in the 
third column. The same substitution for the factor h in the third row, followed by a 
similar addition of rows, will leave us h replaced by v 3 in the third row. We discover 
that the volume is a determinant involving the vectors v, in a symmetric way. 

Vi-Vi vi-v 2 v r v 3 
v 2 • Vi v 2 • v 2 v 2 • v 3 
V 3 • VI v 3 • v 2 v 3 • v 3 

Finally, if V is the n x 3 matrix whose columns are vi, v 2 , v 3 , in that order, then 


= vol 2 (vi Av 2 Av 3 ) 


F f F = 


/Vi-V! V!-V 2 V1 • V 3 \ 
V 2 • Vj v 2 • v 2 V 2 • v 3 , 
\V 3 • Vi v 3 • v 2 v 3 •v 3 / 


so detF^F = vol 2 (vi Av 2 A v 3 ). 

We consider next the 4-parallelepiped vi A v 2 A v 3 A V 4 ; it has 
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base, a 3-parallelepiped: vi A V 2 A V 3 , 
height vector, orthogonal to base : h = V 4 — a\\\ — 02^2 — <23 v 3 , 

and we can define its 4-volume as the product of the 3-volume of its base with the 
length of its height vector. The same, rather lengthy, argument we have just carried 
out allows us to determine the square of the 4-volume as detV^V, where V is the 
matrix whose columns are the vectors v,. This process of establishing the volume 
of a ^-parallelepiped from the volume of a (k— 1 )-parallelepiped is an example of 
mathematical induction, and shows for every k < n that the squared ^-volume is 
detV'V. 


2.36. Show that the square of the w-volume of an ^-parallelepiped, as defined in the 
text, equals det V^ V, as derived in these exercises. 

2.37. Let V be the n x 3 matrix whose columns are Vi, V 2 , V 3 ; verify that 


F t F = 


( vi • vi vi • v 2 vi ■ v 3 

V2-V1 V2-V2 V2-V3 
V3 • VI V3 • V2 V3 • V3 


2.38. Let h = v* — «ivi- ak-\\k~i, where ai,..., a^-i are arbitrary real num¬ 

bers. Show that h is in the (hyper)plane that contains v* and is parallel to the 
linear subspace of M" spanned by vj,..., \k-\- 

2.39. Find a vector h in the plane that contains V 3 and is parallel to vi A V 2 , when 

a. vi = ( 1 , 0 , 1 , 0 ), V 2 = ( 2 , 1 , 1 , 1 ), v 3 = ( 0 , 1 , 2 , 0 ). 

b. vi = ( 1 , 0 , — 1 , 0 ), v 2 = (2, 1 , 1 , 1 ), v 3 = ( 0 , 1 , 2 , 0 ). 


2.40. Determine the rank and nullity of each of the following matrices, viewing 
each as a linear map. 




/0 1 0 0\ 

/ 12 3\ 

b. 

0 0 10 

I 4 56 ; 

0 0 0 1 



V 1 1 V 


1 ^ 2 \ 

\ 1 0 

2 4 

d. 0 1 1 

3 6 

\1 0 1 


2.41. a. Solve the equations 


5m + 3v— 3 w+x = 0, 

3n + 2v + 6 w — 2x = 0, 

for u and v in terms of w and x. 

b. Can you solve for w and x in terms of u and v? What happens? 

c. Can you solve for u and x in terms of v and vv? What is the result? 

2.42. This question concerns the linear map L : ffi 3 —> R 3 defined by the equations 
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X = U + V, 

y = v+w, 


2 Geometry of Linear Maps 


L : 


z=u — w. 


a. What is the dimension of kerZ? Give a basis for kerZ. 

b. Find the matrix for a linear map M : R p —> R 9 whose graph is the kernel 
of Z. What are the values of p and </? 

c. Write M as a set of q equations in p variables. 

d. What is the dimension of imZ? Give a basis for imZ. 

e. Find the matrix for a linear map A : R 7 R* whose graph is the image 
of Z. What are the values of j and k‘! 

f. Write A as a set of k equations in j variables. 

2.43. a. Find all solutions (w,v) to the equations 

u — 2v = 5, 

4v — 2 h = —10, 

and sketch the solution set in the (w, v)-plane. 

b. Describe the solution set in (a) as the graph of a suitable function. Is your 
sketch in (a) the graph of that function? 

c. Describe the relation between the solution set in part (a) to the set of solu¬ 
tions to the equations 


u — 2 v = 0 , 
4v —2« = 0. 


2.44. Let Z : M" —> R p be an arbitrary linear map, and let 

V = {(u,Z(u)) | u G R"} Cl" xE f = R"+^ 

be the graph of L. The purpose of this exercise is to show that V is a linear 

subspace of R” +p of dimension n. 

a. Show that the sum of two vectors in V is also in V. That is, given vi = 
(ui,Z(ui)) and V 2 = (u 2 ,Z(u 2 )) with uj and U 2 in R", show that vi +V 2 
also has the form (w,Z(w)) for some suitable w in R". 

b. Show that any scalar multiple of a vector in V is also in V. 

c. Suppose ‘B = {m,U 2 , ... ,u„} is a basis for R". Let vj = (u/,Z(u/)) for 
j = 1,2Show that 

i. Q = {vi,V 2 , ■ ■ ■, v„ } is a linearly independent set of vectors in R" +p ; 

ii. Cj spans V ; that is, any vector in V can be written as a linear combina¬ 
tion of the vectors {vi,V 2 , ■ ■. ,v„} that span V. 

d. Explain why dim V = dim graph Z = n. 


Chapter 3 

Approximations 


Abstract Approximations are at the heart of calculus. In Chapter 1 we saw that 
the transformation of differentials dx = (p'(s)ds can be traced back to the linear 
approximation Ax ta (p'(s)As (the microscope equation), and that the factor <p'(s) 
represented a local length multiplier. We also suggested there that the transformation 
dxdy = rdrdO of differentials from Cartesian to polar coordinates has the same 
explanation: the polar coordinate change map has a linear approximation (a two- 
variable “microscope” equation) and the factor r is the local area multiplier for that 
map. In this chapter we construct a variety of useful approximations to nonlinear 
functions of one or more variables. However, we save for the following chapter a 
discussion of the most important approximation, the derivative of a map. 


3.1 Mean-value theorems 

The derivative indicates how much a function changes. It does this in the microscope 
equation, for example, and also in a similar equation called the law of the mean. 
First, consider the microscope equation, in this form: 

f(x) — f(a) ~f{a)(x — a) forxssa. 

This says each change x — a in input produces a change f{x)—f(a) in output that 
is approximately f (a) times as large. Although this equation is only approximate, 
note that the multiplier f (a) that links the changes is the same for all x in some 
sufficiently small neighborhood of a. 

The law of the mean makes a similar statement, but there are important dif¬ 
ferences. It says that for each fixed x, say x = b, we can find a “mean value” c 
somewhere between a and b for which 

f(b)-f(a)=f(c){b-a). 
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The link between changes in input and output is now exact rather than approximate, 
and b need not be near a, as in the microscope equation. However, these benefits 
come at a cost: the location of the c varies with b and, in fact, is not usually explicitly 
known, even when a and b are. Also, f has to be continuous from a to b. As the 
figure indicates, c is to be chosen so that the slope f(c) is equal to the slope of the 
line segment from (a,/(a)) to (b,f(b)). We can put the law this way: 

When x changes from a to b, then f(x) undergoes a change that is exactly f (c) 
times as large. 

The law of the mean is the first of several mean-value theorems we consider, and it 
is the most basic: all others follow from it. 


Bounding the Our ignorance of the location of c makes it difficult to use the law of the mean in 

magnitude of change some circumstances. However, it is natural to assume that we do have information 
about f and its derivative. In particular, we can make use of a bound on the size of 

I/Ml- 

Theorem 3.1 (Mean-value theorem). Suppose fix) is continuously differentiable 
for all x between a and b; then 


l/0)-/(«)l< max \f{x)\\b-a\. 


□ 


Although the law of the mean states explicitly—by means of an equality—how 
much the function grows (i.e., how f{b) — f(a) depends on b — a), the mean-value 
theorem just provides a bound on that growth in terms of a bound on the derivative. 
As we show below, there is no direct extension of the law of the mean to vector- 
valued functions, that is, to functions x = f(v) where the value x is a vector in 
with p> 2. For a vector-valued function, we are only able to bound its growth by 
the size of its derivative, as in Theorem 3.1. 

The law of the mean is frequently called the mean-value theorem ; however, we 
reserve the latter term for the general theorem that extends to vector-valued func¬ 
tions. 

Integral mean-value Continuing on, we formulate an integral law of the mean. 

Theorem 3.2 (Law of the mean for integrals). If fix .) is a continuous function on 
the interval from a to b, then 


[ f(x)dx=f(c)(b-a) 

J a 

for at least one point c in that interval. 

Proof. Let F(x) be an antiderivative of f{x) (i.e., F'{x) = f{x)). The fundamental 
theorem of calculus guarantees that F exists because / is continuous. If we now 
apply the law of the mean to F, we find 

[ f(x)dx = F{b)-F(a) = F'(c){b-a) = f{c)(b-a) 

J a 

for some c between a and b. □ 





3.1 Mean-value theorems 


73 


We can connect this with the mean value of /. By definition, the mean value of 
a function y = /(x) on the interval a < x < b is the constant / whose integral over 
[a, b\ (the hatched region in the figure) is equal to the integral of /(x) over [a, b] (the 
shaded region): 

f f(x)dx= f fdx = fx(b — a). 

J a J a 

This leads to the more familiar form of the definition: 

- 1 f b 

mean value off on [a,b\ = /= — - / f(pc)dx. 

u U J a 

Because / lies between the maximum and minimum values of/(x) on [a, 6 ], and 
because / is continuous, there is at least one c between a and b for which /(c) = /. 
In other words, in the equation 

[ f(x)dx=f(c)(b-a) 

J a 

provided by the integral law of the mean, /(c) is in fact the mean value of / on the 
interval [a,b\. 

Because we know /(c) = / in the integral law of the mean, we can sometimes 
determine c explicitly. For example, let f(x) = fr 2 —x 2 . The graph ofy = /(x) is a 
semicircle of radius r on the interval [— r,r\, so the area under it is nr 1 /2 and 


nr 2 

— 



dx = /(c) 2 r 


2 r\J r 2 — c 2 . 


We can solve this for c and get c = ±r ^/1 — n 2 / 16; see the exercises. 

The integral law of the mean has us compute an integral by extracting the mean 
value of its integrand. The following theorem makes a more general assertion: there 
are circumstances where we can compute an integral by extracting the mean value 
of just a part of its integrand. 

Theorem 3.3 (Generalized law of the mean for integrals). If f(x) and g(x) are 
continuous on the intei~val [af] and g(x) > 0 there, then 

[ f{x)g{x)dx = f(c) ( g(x)dx 
J a J a 

for at least one point c in that interval. 

Proof. Our proof here echoes the previous argument, which said that / = /(c) for 
some c because / lies between the minimum and maximum values of the continuous 
function /. 

Note first that if g(x) = 0 for all x in the interval [a, b\, then both integrals equal 
0 and there is nothing to prove. So we assume g(x) > 0 for at least some x in the 
interval; because g is continuous and g(x) > 0, it also follows that 


Mean value 
of a function 



a c b x 


Generalized law of the 
mean for integrals 
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Law of the mean for 
multivariable functions 


/' 


g(x)dx > 0. 


Now let m and M be the minimum and maximum values of f(x) on [a,b\; thus 
m < f(x) < M. Because 0 < g(x), we also have 

mg(x) < f(x)g(x) < Mg(x) 


for all x in the interval, and therefore 

pb pb pb 

m / g(x)dx< / f(x)g(x)dx<M / g(x)dx. 

Ja Ja Ja 

We have already noted that the integral of g is nonzero, so we can divide these 
inequalities by it and conclude that the expression 



lies between the minimum and maximum values of f(x) on the interval. Therefore, 
because / is continuous, this expression must equal /(c) for at least one value c in 
the interval. □ 

The condition g(x) > 0 can be changed to g(x) < 0 without affecting the truth 
of the theorem; however, the theorem does fail if g(x) changes sign on the interval. 
You can explore these points in the exercises. The proof of the generalized law of 
the mean can be adapted to have the result stated as an inequality, as with the mean- 
value theorem for functions. 


Corollary 3.4 If f(x) andg(x) > 0 are continuous on [a, b\, then 


rb 

/ f{x)g(x)dx 

< max |/(x)| 

fb 

/ g(x)dx 

J a 

a<x<b 

Ja 


□ 


Until now we have considered only functions of a single variable, but there are 
analogous mean-value theorems for functions of several variables. To extend the law 
of the mean to such functions, it is convenient for us to recast the law in a slightly 
different form. Let Ax = b — a; then the point c that lies between a and b can be 
written as c = a + OAx for some 0 < 6 < 1, and the law itself can be written as 

f(a + Ax) = f(a) +/(a + OAx) Ax. 

Now consider a function of several variables F(x) = F(x i,... ,x„). The analogue of 
the ordinary derivative f (x) is the gradient of F (constructed with the differential 
operator V, “nabla”): 
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Vi/x) = gradi/x) 




Theorem 3.5 (Law of the mean). Suppose F(x) = F(x\ ,...,x„) is a continuously 
differentiable function of x; then 

i/a + Ax) -i/a) = Vi/a + 0Ax) • Ax, 


for some 0 < B < 1. 

Proof. There is a simple way to reduce this to a single-variable question; let /(f) = 
i/a + fAx) =F(a\ + tAx \,... ,a n +tAx n ). Then the chain rule gives us the derivative 
of/: 


f ft) = — (a +1Ax) Axi -1-1- -— (a + tAx) Ax„ = VF (a -I- f Ax) • Ax. 

OXl ox n 

This is the scalar (or dot) product of the gradient with the vector Ax. By the law of 
the mean for /(f), we know there is a 0 < 6 < 1 for which the following is true: 

i/a + Ax) — i/a) = /(I) — /(0) = / (9 ) (1 - 0) 

= Vi/a + 0Ax) • Ax. □ 

Thus, even for a function F(x ) of several variables, if we use the dot product for 
multiplication, we can express the law of the mean in the following way. 

When x changes from a to b = a + Ax, the change in F(x) is exactly VF times as 
large, where the gradient VF is evaluated at some intermediate (or “mean ’’) point 
along the line from a to b. 

We can rephrase the multivariable law of the mean as an inequality that bounds 
the growth of F in terms of a bound on its derivative (i.e., its gradient): 

Corollary 3.6 (Mean-value theorem) If z = F(x) is continuously differentiable on 
the line that connects a and b, then 

|-F(b) — i/a)| < max 11 Vi/x)|j ||b — a||, 

X 

where the maximum is taken over all points x on the line from a to b. □ 

For functions of two variables, we have a natural extension of the integral law 
of the mean. The proof follows the pattern of all previous proofs; see the exercises. 
We deal with functions of three or more variables in a later chapter, after we discuss 
their integrals. 

Definition 3.1 The mean value F of the function F(x,y) on the domain D in R 2 is 

F =- [[ F(x,v) dxdy. 

area D JJ 

D 
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Theorem 3.7 (Law of the mean for double integrals). LetF(x 1 y) be a continuous 
function on a connected domain D in R 2 . Then there is at least one point ( c,d ) in D 
where F takes on its mean value F; thus 


JJ F(pc,y)dxdy = F(c,d) 

D 


x area D. 


□ 


The problem with 
vector-valued functions 



To complete the present study of mean-value theorems, let us consider vector¬ 
valued functions x = f(v), where x is a vector in RT, p>2. For even in the simplest 
such case—a vector-valued function x = f(t) of a single variable, which defines a 
curve in R p —we now show there can be no direct extension of the law of the mean 
as an equality. 

To see why, consider the helix x = f(t) = (cost, sin t,t) in R 3 . Let us try to express 
the change f(2tr) — f(0) in the form f {c){2n — 0) for some suitable mean value c 
between 0 and 2 n. The vector Af = f(2tr) — f(0) = (0,0, 2k) is vertical, but the 
derivative 

f (c) = (— sinc,cosc, 1) 

is never vertical, because its first two components are never simultaneously zero. 
Therefore, no scalar multiple of f (c) will ever equal Af, even approximately. In 
particular, there is no number c for which 


f(27t) — f(0) = f {c)(2n — 0). 


Even though the law of the mean itself fails to hold, we can still get a bound 
on the size of the change in f that is exactly analogous to the bound provided by 
Theorem 3.1; in fact, we have 

||f(27t) — f(0)|| < max||f'(?)|| \2n - 0|, 

because the left-hand side equals 2 n and the right-hand side equals 2\[2n. 

Theorem 3.8 (Mean-value theorem). Iff : / —> R^ has a continuous derivative on 
an interval I that contains a and b, then 


||f(h)-f(a)|| < max ||f'(t)|||h-a|. 


Extension to 
multivariable inputs 


rb 

Proof. Because f (b) — f (a) = f '(t) dt, we have 

J a 


|f(b)-f(a)|| = 


rb 

/ f '{t)dt 

J a 


< max :\\f(t)\\ 

a<t<b 


f 


dt 


< max ||f , (t)|| \b — a\. 

a<t<b 


□ 


This theorem relies on the fact that we can regard the derivative f ( t ) as a vec¬ 
tor of the same sort as f(t) itself; this, in turn, is a consequence of the fact that the 
input is just a single variable t. If, instead, x = f(v), where v is in RT, p>2, then 
the derivative of f is something new. We define this new derivative below (Defini- 
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tion 3.16, p. 99) and analyze it in the following two chapters. At that time, we state 
and prove a natural extension of Theorem 3.8 for maps (Theorem 4.15, p. 140). 


3.2 Taylor polynomials in one variable 

You are probably familiar with Taylor polynomials for fix) written in the form 

/(«)+/(«) (* — «) + 0 - a) 2 H- Y ^ ^\ x-a) n . 

2! n\ 

These expressions give us new ways to approximate f{x). One obvious benefit of 
any polynomial approximation is that it can be computed using only the four basic 
operations of arithmetic; most functions are not computable in this sense. Taylor 
polynomials become better approximations as n increases, and as x gets closer to a. 
We also see less obvious reasons that make them valuable. 

Consider successive terms in the Taylor polynomial to be separate contributions 
to the approximation. Then we find that the lowest-order term makes the largest 
contribution (at least when x is close to a) whereas the succeeeding terms, involving 
higher and higher powers of x — a, contribute less and less to the total. Also, because 
a Taylor polynomial gives a better approximation to /(x) the closer x is to a, we 
write the polynomial instead in terms of the variable Ax =x — a that indicates how 
close x is to a. We took the same approach when we reformulated the law of the 
mean (pp. 74ff.) 

Definition 3.2 Suppose y = fix) has derivatives up to order n at x = a; then the 

Taylor polynomial of degree nfor f at x = a is 

Pn,a(Ax) = fifl) + f (a)Ax+ ( (A *) 2 4- Y — —^ (Ax)". 

2 ! n\ 

Notice that this expression for P na includes each of the Taylor polynomials I\) a , 
Pi,a, Pi,a, ■ ■ ■ ,P n -\,a as an initial part. 

In many cases, it is easy to calculate these polynomials. For example, if f(x) = 
yfi, n = 3, and a = 100, then 


. . Ax (Ax) 2 

A 100 Ax =10 +- 2 —— 

3,urn ) ~r 2Q g000 


(Ax) 3 

1600000' 


Let us see how this gives us approximate values of y/x when x « 100. First we 
build a sequence of estimates of V102 (so Ax = 2) by adding in the terms of the 
polynomial, one at a time. This allows us to see how the approximation improves as 
the degree increases. Second, we then do the same for y/120 (Ax = 20). Comparing 
the two sets of estimates allows us to see how the approximation improves as Ax 
decreases. In all cases our focus is on the error: that is, on the difference between 
the true value and the approximation. 


New approximations 


Properties of the 
Taylor approximation 


Taylor polynomial; 

Ax = x — a 


Estimates and errors 
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3 Approximations 


Contribution made by 
each term in P3 ,ioo(2) 


Comparative errors for 
v / I02 and /DO 


Errors and 
Taylor’s theorem 


First, we consider how each term in the cubic Taylor polynomial 


^3,100(2) = 10 + 


20 


4 8 

8000 + 1600000 


10.099505 


contributes to the estimate of a/ 102 = 10.099504938362... . Flere are the results 
in a table: 


Degree Term Sum Error = 3 /102 — Sum 

0 10 10 0.0995... « lx 10“' 

1 0.1 10.1 -0.000495... «—5 x 10 ~ 4 

2 -0.0005 10.0995 0.000004938... « 5 x 10 " 6 

3 0.000005 10.099505 -0.0000000616...«-6 x 10 ~ 8 


Thus we see that higher terms contribute less and less to the sum, but they effec¬ 
tively “fine-tune” the estimate. In fact, the contributions drive the error down expo¬ 
nentially, that is, the error at each stage made by the intermediate sum P% too(2) is 
roughly of size I0~ ak ~ h , for some a > 0. 

Of course the terms get smaller because their coefficients do, and this is clearly 
the result of the choice of the original function/(x) = yfx. For a different function, 
the coefficients may not be so obliging. Nevertheless, by comparing the errors that 
Pi^oo(Ax) makes for different values of Ax, we largely eliminate the effect of the 
coefficients. At the same time, we see how the error is connected to the size of 
Ax, our second objective. The following table gives comparative information for 
3/120 = 10.954451 15... (Ax = 20). 

Degree Term Sum Error = a/120 — Sum 


0 

10 

10 

0.954... 

« 1 x 10 ( 

1 

1 

11 

-0.045 5... 

w-5x 10 

2 

-0.05 

10.95 

0.00445... 

« 4x10 

3 

0.005 

10.955 

-0.000548... 

w-5 x 10 


Compare the rightmost columns of the two tables for k = 2 or 3: x = 102 is only 
1/10 as far from 100 as x = 120, and the error that F /100 makes in estimating a/ 102 
is, roughly speaking, only about 1/10 /c+1 times as large as the error for a/120. Later 
(p. 83), we confirm this is true even for k > 3. 

Our experiments with /x suggest that that we should study how 


error = f(a + Ax)—P, ha (Ax) 

depends on Ax and on 11 in general. The result is contained in Taylor’s theorem, 
which we state and prove below. It spells out the error, and with it we are able to see 
that P n . a ( Ax) makes a smaller error than any other polynomial of the same degree 
in approximating /(x) near a. Before we state the theorem, let us look first at the 
simple case when n = 0. 
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The Taylor polynomial for n = 0 is just the constant function /fo , a (Ax) = f(a), so 
we have the estimate 


error = f(a + Ax)—f(a)~/(a) Ax 

by using the microscope equation. Because f (a) is fixed, this expression already 
tells us the error is roughly proportional to the size of Ax itself. So if x\ is 1/10 as 
far from a as X 2 (and X 2 is still sufficiently close to a), then the error that Pq 0 makes 
in estimating f{x\) will be about 1/10 as large as the error in estimating /(x 2 ). 

By contrast, the law of the mean (p. 74) gives us the exact error, but in terms of a 
quantity 0 < t < 1 whose value we may not be able to determine effectively: 

f(a + Ax) — /(a) = f (a + tAx) Ax. 


Because the derivative of f(a + tAx) with respect to t is f (a + tAx) Ax (chain rule), 
we can also express the exact error as an integral: 


[ f (a + tAx) Axdt = f (a + tAx) 
Jo 


= f(a + Ax) — f(a). 


Although this integral is more complicated-looking than the other ways of writing 
the error, it turns out to be the most useful. Let us rewrite the last equation as 

f(a + Ax)=f(a) + [ f (a + tAx) Axdt = Pq u (Ax)+Rq u (Ax)\ 

Jo 


we call 


Ro,a(Ax) = Ax 


/' 


f (a + tAx) dt 


the remainder because it is what is left after we subtract the Taylor polynomial from 
the function. Of course it is also the error we make in replacing the function value 
by the polynomial value. We are now ready to state Taylor’s theorem; it expresses 
the remainder for a general P, ha as an integral. 


Theorem 3.9 (Taylor). If /(x) has continuous derivatives up to order n + l on an 
interval containing a and a + Ax, then 


f(a + Ax) — /(a) +f (a) Ax+ ^ (Ax) 2 H-b ~ —r - ^- 

2! n\ 


(Ax) n + R na (Ax), 


/Ay'iW+I /»1 

where R n a (Ax) = - / (a + tAx)(\ — t) n dt. 

n\ Jo 

Proof. Because the theorem is essentially a sequence of formulas—one for each 
value of n —we prove them one at a time, “by induction on That is, we first 
prove the formula involving /{ ) a , then use it to prove the one involving P\ a , then 
use that to prove the one involving P 2 . a , and so on, generating each new remainder 
as we go. To prove Taylor’s formula for Pq m , 


The error for P 0 a (Ax) 


Taylor’s formula 
with remainder 


Proof by induction 
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3 Approximations 


First induction step 


The error for /\ a (x) 


Second induction step 


f(a +Ax) = f(a) + Ax [ f(a + tAx)dt, 

Jo 

just set (p{t) = f(a + tAx) and use the fundamental theorem of calculus in the form 

<P(1) = <P(0)+/ (p'(t)dt. 

Jo 

The induction that takes us from one formula to the next is just an integration 
by parts carried out on the remainder integral. The integration by parts formula we 
use is 

0 ,0 


/ 
J a 


idv = 


L 


— / vdu. 


To begin, we set u = f (a + tAx) and dv = dt in the formula for Pq m . Then 
du = f '{a + tAx) Ax dt and v = t + C, where C is a constant of integration that 
we determine in a moment. We have 


Ro a (Ax)=Ax[ f(a + tAx)dt 
Jo 

= Ax f (a + tAx) (t + C) 


1 f 1 

— Ax (t + C)f"(a + tAx)Axdt 

o Jo 


f 1 

= Ax/Va + Axjfl +C) — Axf(a)C— (Ax) 2 / (t + C)f l/ (a + tAx)dt. 

Jo 

If we now set C = — 1 the first term on the right disappears and the second becomes 
+f(a)Ax, which is exactly the linear term in P\. a {x). Thus the formula with Pq 0 
becomes (note the sign changes) 

rl 

f(a + Ax)=f(a)+Ro a (Ax)=f(a)+/(a)Ax+(Ax) 2 / f'(a + tAx)(\ — t)dt. 

Jo 

'---' '-v-' 

P\,a (At) Rl,a(M 

The error in replacing f(a + Ax) by / > i :I ,(Ax) is thus the new remainder, 

R\ a (Ax) = (Ax) 2 [ f"(a + tAx)(l — t)dt. 

' Jo 

The second induction step starts with Taylor’s formula when n = 1: 

f(ci + Ax)=P ha (Ax)+R ha (Ax) 

= f(a)+f'(a)Ax + (Ax) 2 [ f'(a + tAx)(\ — t)dt. 

Jo 


To integrate by parts here, use u = f'{a + tAx), dv=(l—t) dt, and v = — (1 — t) 2 / 2; 
then 
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r\ 

R\ a (Ax) = (Ax) 2 / f"(a + tAx)(l — t)dt 
’ Jo 

= — (Ax) 2 f"(a + tAx)- -— + (Ax ) 2 [ (a + /Ax) Ax - — dt 

n Jo 2 


/» 


2 

(A x) 2 + ^- f / (3) (a + rAx)(l-f) 
2 Jo 


— tYdt. 


With J?i j 0 (Ax) replaced by the two terms in the last line, our previous equation for 
f{a + Ax) (i.e., Taylor’s formula when n = 1) now reads 


f(a + Ax)=f(a)+f / (a)Ax+^-^-(Ax) 2 + ^—^— [ f' 3 \a + tAx)(l — t) 2 dt. 

2 2 Jo 


Pi A Ax) 


r 2 A Ax ) 


As we see, this has become Taylor’s formula when n = 2. 

In the next step, we are able to see how the factorial expressions arise. Our start¬ 
ing point is f(a + Ax) = P 2 A^ X ) + R 2m (Ax), and we must integrate R 2 ^ a (Ax) by 
parts. If we use u = f^\a + tAx) and v = — (1 — t) 3 / 3, then 


R 2 , a (Ax) 


- { -^-/ 3 \a + tAx)(l-t) 3 


+ -—— f f^ 4 \a + tAx)Ax(l—t) 3 dt 
o 3 • 2 Jo 


3! 


(Ax ) 3 + [ f^ 4 \a + tAx){\—t) 3 dt. 

3! Jo 


Consequently, Taylor’s formula when n = 2 becomes 


f(a + Ax) =/ > 2 ,a(Ax)+J? 2 ,a(Ax) 

= P2.a(&x) + ~ (Ax ) 3 + [ f i4 ' l (a + tAx)(l-t) 3 dt, 

3! 3! Jo 




^ 3 ,a (Ax) 


which is just Taylor’s formula when n = 3. 

To complete the induction, we must transform Taylor’s formula when n = k, 

f(a + Ax)= P^a (Ax) + R kj0 (Ax), 

(where k is any nonnegative integer) into the corresponding formula when n = k+ 1 , 
/(a + Ax) =flfc+i, a (Ax) + l , a (Ax). 

This is another integration by parts (see the exercises): 


Third induction step 


General induction step 
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3 Approximations 


Forms of the remainder 


RkA^) = ^—T ,— [ f^ k+1 \a + tAx){l-t) k dt 
k\ Jo 


k +1 ,1 


k\ 

f (k+X \a) 


rf (^) A+1 +titw / 0 V +2) («+^)(i -t) k+l dt. 


{k+ 

It implies f(a + Ax) 


= Pk 


-< Al)+ Wi 3 r (Al)t+l + wB.ll liM)(a+, * x){ '- !r ' d, ' 


(.Ax ) k+2 + 


Pk+ l, a (Ar) 

completing the general induction step. 


R k+\,a{^) 


□ 


Our approximations of x/T02 and \/120 suggested that the error R„ M (Ax) gets 
small as Ax does. In fact, in those examples we saw that the error got small faster 
than Ax; R„^ a (Ax) vanished like (Ax)" +1 . This is true in general; to see why, we first 
write R n ,a( Ax) in some alternate forms. 


Corollary 3.10 (Lagrange’s form of the remainder) For each Ax ss 0, there is a 
6 = 9 (Ax) with 0 < 6 < 1 for which 


Pn,a( Ax) 


/ ( ” +1) (a +6. Ax) +1 

(n+ 1 )! 1 ’ ' 


Proof. With the generalized integral law of the mean (Theorem 3.3), we can extract 
f( n+1 \a + tAx) from the integral defining R n a (Ax), and then compute the integral 
of the remaining function, (1 — t) n , exactly. Thus, for a given Ax, there is a point 9 
in the interval [ 0 , 1 ] for which 


R„ a (Ax) = -— - / f^ n+x \a + tAx)(\-t) n dt 

n! Jo 

= r< 

n\ Jo 

= +1 ^\a + 6Ax) f (1 -Q" +1 \ 1 

n\ ^ n +1 Jo 

f^(a + 6 Ax) +1 

(n + 1 )! 1 ’ ' 


□ 


Taylor’s formula and the 
law of the mean 


If we write Taylor’s formula for n = 0 using the Lagrange form of the remainder, 
we get 

f(a + Ax) = f(a) +f(a + 0Ax)Ax, 
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which is just the law of the mean. Thus we can see Taylor’s formula with Lagrange’s 
remainder as an extension of the law of the mean that incorporates higher powers of 
the displacement Ax. 

With the remainder in Lagrange’s form we can see why we said, on page 78, that 
the error Pa- ioo makes in estimating \J 102 would be only about 1 / 10 A+1 times the 
error in estimating \/120. Consider first k = 3; because /(x) = y/x. 


f?3,ioo(Ax) 


•^Too+e^W 


-15 

16 x 24(100 + 0Ax ) 7 / 2 


(Ax) 4 . 


Because the number 100 + 0Ax certainly lies between 100 and 120, the coefficient 
— 15/(16 x 24(100 + 0Ax) 7 / 2 ) will lie in the narrow range from —4 x 10 ~ 9 to 
—2 x 10 ~ 9 for both estimates. So the main cause of the difference between the 
two errors must be the factor (Ax) 4 : in the two cases, its values are 2 4 = 16 
and 20 4 = 16 x 10 4 . This is why loo(2) is only about 1/10 4 times as large as 
^3,ioo(20). Furthermore, because 100 < 100 + 9Ax < 120, we see that the errors 
themselves must lie in the following ranges: 


-6.4 x 1(T 8 < £ 3 . 100 ( 2 ) < -3.2 x 10“ 8 , 
-6.4 x 1(T 4 < £ 3 . 100 ( 20 ) < -3.2 x 1(T 4 . 


In fact, we already obtained by direct calculation the values £ 3 . 100 ( 2 ) «- 6 x 10 ~ 8 
and £ 3 . 100 ( 20 ) « —5 x 10 ~ 4 ; they fit into these ranges, as they should. 

Now take an arbitrary k> 2; then (see the exercises) 

n /. \ ±1 • 3 • • • (2k—1) i+1 

^ ~ 2 A+1 (k + 1)!(100 + QAx) k+x / 2 ^ 

The term 1/(100 + 0Ax) i+1 / 2 again varies over a small range of values when we 
have 0 < Ax < 20; the main cause of the variation of £*. 100 (Ax) with Ax is the factor 
(Ax) i+I . Therefore, 

£,. 100 ( 2 ) ^ 2 k+1 1 
7^/t,ioo(20) ~ 20 a+1 10 i+1 ’ 

so the error Pk.ioo makes in estimating v/102 is only about 1/10 A+1 times as large 
as the error estimating >/120. 

Because /("+L is a continuous function, the factor f( n+1 \a+ 6Ax) in the La¬ 
grange remainder is as close as we wish to /(” +1 ) (a) if Ax is sufficiently close to 0. 
This gives us another form of the remainder. 


Corollary 3.11 (Generalized microscope equation) When AxsiO, 


Pfi,a{ Ax) 


(n+ 1)! 


(Ax)" +1 . 


□ 


Comparative estimates 
of v/T02 and +T20 


How the comparative 
error depends on k 


Taylor’s formula and the 
microscope equation 
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3 Approximations 


Notice that when n = 0, ^o, a (Ax) ks f(a)Ax, and Taylor’s formula becomes the 
microscope equation, 

f(a + Ax) « f(a) + /(a)Ax. 

This is why we call the statement in the corollary the generalized microscope equa¬ 
tion. Taylor’s formula thus generalizes both the microscope equation and the law of 
the mean (Lagrange’s form). 

The next term is most The generalized microscope equation is a remarkable result. It says that the re- 

of the remainder mainder looks more and more like the next term in the Taylor expansion of /, the 

more we magnify the graph of the remainder in a microscope window centered 
at Ax = 0. Thus, because most of the error at the «th stage is equal to the term of 
degree n + 1 , we can eliminate most of the error by adding that term to the nth stage 
(i.e., to /(,.„)• The result is, of course, the next Taylor polynomial, P n +\, a - This is 
perhaps the simplest and most intuitive way of seeing how the Taylor polynomials 
arise. 

An example Here is a visual example to help make it clear how much the remainder looks 

like the next term in the Taylor expansion. Let us again use the function f(x) = y/ic 
at a = 100, but this time just the second-degree approximation instead of the third. 
According to the generalized microscope equation, the graph of the remainder, 

* 2 ,ioo(Ax) =/(100 + Ax)- j P 2 , 100 (Av) = V100 +Ax— 


Macroscopic versus should look like the cubic term in T^ioo, at least if the domain is suitably restricted 

microscopic to a small interval around Ax = 0. Below we see two views of this graph. 



On the left, where it is plotted over a large domain, —100 < Ax < 100, the graph 
looks only vaguely like a cubic. It fails to have the necessary symmetry, for example. 
But on the right, where its domain has been shrunk to — 1 < Ax < 1 (and the vertical 
scale has been exaggerated), the graph is now indistinguishable from the graph of 
the cubic 


Ay 


(Ax) 3 

1600000 


= 6.25 x 10 -7 x (Ax) 3 . 














3.2 Taylor polynomials in one variable 


85 


So _/? 2 ,ioo(Ax) does indeed look like the cubic term (Ax) 3 /1600000 in a sufficiently 
small window centered at Ax = 0. Note that we needed to exaggerate the vertical 
scale; the graph would otherwise have appeared to be just a horizontal line! The 
vertical scale is linked to the small coefficient (1/1600000 « 6 x 10 7 ) of (Ax) 3 . 

The following corollary says that |.K„ ja (Ax)| is bounded by (a multiple of) 
|Ax|" +1 . It relies on the continuity of the ( n + l)st derivative of /, which was one of 
the original hypotheses of Taylor’s theorem. The case n — 0 is the ordinary mean- 
value theorem for / (Theorem 3.1, p. 72). 

Corollary 3.12 Let f^ n+x \a + Ax) be continuous for |Ax| < r; then 

\Rn,a (Ax) | < max|/ (,1+1 )(x)| , 

x (w+1)! 

where the maximum is taken over all x between a and a + Ax, inclusive. 

Proof. When 0 < t < 1, then x = a + t Ax lies between a and a + Ax, inclusive. The 
continuous function p n+l ’(a + tAx) has a finite maximum on this closed interval. 
Therefore we have 


|7?n, a (Ax)| 


(Ax) 


iW +1 /»1 

y— J Q f {n+l \a + tAx)(l 


|Ax| 


n 

n +1 


< —'■ - max |/*" +1 Va + /Ax)| 

n\ o<t<\ 

= max|/" +1 >(x)|M^, 

x [n + 1)! 


— t) n dt 

[\l-t) n dt 
Jo 


because the value of the last integral is 1 / (n + 1). □ 

With this corollary we finally have a simple and useful way to describe the size 
of the error and, in particular, how rapidly it vanishes as Ax —> 0. Here is the basic 
idea (expressed in terms of a variable t ): although any positive power of a variable 
t vanishes (i.e., tends to 0) as t —* 0, a higher power vanish more rapidly, or as we 
say, to a higher order. For example, t 3 vanishes to a higher order than t 2 because 
the quotient f 3 /f 2 also vanishes as t —> 0. We say that t vanishes to the first order, 
and t p vanishes to order p (for any positive power p > 0). 

To describe the order of vanishing of an arbitrary function <p(t) as t —> 0, we 
define what it means for <p(t) to vanish to higher order than a given power of t, 
mimicking the way we compared f 3 and t 2 . For the moment, we are concerned with 
the order of vanishing of <p(t) only as t —>■ 0; later in the section we generalize to the 
case t —> a for a arbitrary. 


Definition 3.3 We say <p(t) vanishes to order greater than p (at the origin), and 
write (p(t) = o(p), if 


lim 


<P(0 


= 0 . 


tP 


A bound on |R„ ifl (Ax)| 


The order to which 
a power vanishes 


Vanishing to order 
greater than p: 
little oh notation 
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3 Approximations 


Using ratios 
to compare 
orders of vanishing 


Functions can vanish 
unlike any power 


With several variables, 
the ratio usually has 
no limit 


Vanishing to 
the same order 


The symbol “o” is called little oh and is meant to suggest the word order. Read 
“°{p)” as either “of order greater than p" or just “ little oh of p.” 

The condition <p(t) = o(p) is imprecise: when (p(t) vanishes to higher order than 
t p , we do not know how much higher: if (p(t) = o(p), then <p(t) = o(q) for every 
q < p. To get a more precise condition, let us look more closely at the ratio (p(t) lt p 
in the case when <p(t) = Ct m (C f 0); then, for 0 < p < we have 


lim 

o 


<p(0 

tp 


o 

c 


p < m, 
p = m, 
p > m. 


It is evident that we get the most information about tp(t) not when the limit is zero 
but when it takes a finite nonzero value: that is, when p = m. Our example suggests 
that, to gain additional precision about the order of vanishing of an arbitrary (p(t), 
we should focus on the value of p for which the ratio <p(t)/t p has a finite nonzero 
limit. This is certainly the right idea, but there are two technical stumbling blocks. 

First, consider the example of (p(t) = l/ln|f |. This vanishes at 0, but there is no 
p > 0 for which the limit <p(t)/t p is finite and nonzero (see the exercises). There is 
no way around this problem; some functions that do vanish still fail to vanish like 
any positive power of t. 

Second, consider the two-variable function (p(x,y) =x 2 + 2y 2 . (In the next sec¬ 
tion we extend Taylor’s theorem to functions of several variables; the remainder is 
likewise a function of several variables, and we have to consider its order of vanish¬ 
ing.) It is reasonable to say cp(x,y) vanishes to the same order as x 2 +y 2 ; they are 
both homogeneous quadratic polynomials. Flowever, 


lim 


x 2 + 2y 2 


x-+y z 


does not exist. One way to see this is to note that, on the radial line y = mx, 
(p(x,mx) = (1 +2m 2 )/(l +m 2 ), a quantity that takes values between 1 and 2 as 
m varies. Nevertheless, we do have 

1 < * 2 \ < 2 for all (x,y) ^ (0,0), 

x A +y A 

and this is enough to guarantee that x 2 + ly 2 and x 2 -by 2 each vanishes as rapidly as 
the other. In fact, it is sufficient if such upper and lower bounds exist for all (x.y) 
sufficiently close to (0,0). 

Definition 3.4 The functions (p (t ) and iffit) vanish to the same order (at the origin) 

if there are positive constants 8, C\, and C 2 for which 


Ci < 


<P(*) 

v(t) 


< C 2 for all 0 < |f| < 8. 
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We can rewrite the inequalities one at a time so as to indicate that each function 
“dominates” the other in a completely symmetric way: 

kWI<^rl<p(OI; l<p(OI <c 2 W(t)\. 

According to the first, \p(t) vanishes at least to the same order as <p(t)', according 
to the second, tp(t ) vanishes at least to the same order as y(t). In the following 
definition, we have only one of these two inequalities, and the comparison is being 
made with a power. 

Definition 3.5 We say (p(t) vanishes at least to order p (at the origin), and write 
tp(t) = 0{p), if there are positive constants 8, C for which |<p(/)| < C\t\ p when 
/ < 8. Otherwise, we say (p(t) fails to vanish to order p, and write (p (t)*0(p). 

The symbol “O” is called big oh and like “o” it is meant to suggest the word 
order. Read “ 0(p )” as either “of order at least p ” or as “big oh of p .” Note the 
following. 

• (p(t) = O(p) implies cp(t) = 0(a) for all 0 < a < p. 

• (p(t) f 0(p ) implies tp(t ) f 0(/3) for all /3 > p. 

• tp(t) = o(p) implies <p(t) = 0(p), but the converse is not true: if tp(t) vanishes at 
least to order p, there is no reason to think (p(t) vanishes to higher order than p 
(e.g., t p = O(p) but t p f o(p)). See also Exercise 3.15. 

(We use O and o to indicate “order of vanishing;” however, in other settings they 
are used to indicate “order of magnitude.” We avoid this phrase, though, because 
magnitude implies, etymologically at least, “largeness,” not the “smallness” with 
which we are dealing.) 

Big oh notation gives us the right level of precision to describe the order of van¬ 
ishing of an approximation error. With it we get a convenient and vivid way to 
rewrite Taylor’s formula. The first step is to restate Corollary 3.12 (p. 85) in the new 
language. 

Corollary3.13 R„, a ( Ax) = 0(n+ 1). □ 

Next, we enlarge the meaning of O(p) to allow it to stand for an otherwise un¬ 
specified function that vanishes at least to order p (or even allow it to stand for the 
set of such functions). Then, with this in mind, we can rewrite Taylor’s formula in 
the following simple form that indicates just the order of the remainder: 

f(a + Ax) = f(a) +/(a) Ar + ^ (Ax) 2 -|-f -—p—^- (Ax)" + 0(n + 1). 

2! n\ 

In words: f(a + Ax) equals the Taylor polynomial of degree n plus some unspecified 
function that vanishes to order n + 1 in Ax. Often this level of precision is all we 
need. Consider, for example, the infinite Taylor series for /(x) = lnx at a = 1: 


Vanishing at least 
to order p\ 
big oh notation 


0(p ) as a function; 
Taylor’s formula 




3 Approximations 


Functions that agree 
at least to order p 


The “best fitting” 
approximation 


ln( 1 + Ax) 

(Ax) 2 

= Ax- + - 


• + (-l) 


n -1 


(Ax)" 


p n,l (Ax) 


(- 1 )" 


(Ax)" +1 
n + 1 


(- 1 )"+’ 


(Ax)”+ 2 
n + 2 


0(n+ 1 ) 


The first n terms constitute the Taylor polynomial P n i (Ax); the rest are the remain¬ 
der 0{n + 1) in an explicit form. This shows how apt it is to think of 0(n + 1) as a 
shorthand for “the terms that vanish at least to order n + 1 


Definition 3.6 I'Ve say <p(t) and Xi/(t) agree at least to order p in t, and write 

<p(t) = yr(t) + 0(p), if (p(t) — y/(t) = 0(p). 

With this definition, we can put Taylor’s formula, 

f(a + Ax) = P„, a (Ax) + 0(n + 1), 

into these words: “/(a + Ax) and P n , a (Ax) agree (or are equal) at least to order n + 1 
in Ax when Ax is near 0.” 

Taylor’s theorem tells us just half the story about the Taylor polynomial, namely, 
how well it approximates a given function. The other half of the story is that the 
Taylor polynomial is unique: no other polynomial of the same degree approximates 
the function as well. Theorem 3.14, below, explains just what this means. 

Lemma 3.1. If cp(t)/t p — > °° as t —> 0, then (p{t ) f O(p). 

Proof. (By contradiction.) Suppose that <p(t) = O(p); then \q>(t)/t p \ would be 
bounded when / sa 0. However, this contradicts the hypothesis that (p(t)/t p —> °° 
as t —> 0. □ 

Theorem 3.14. Suppose Q(Ax) is a polynomial of degree n that differs from the 
Taylor polynomial P na (Ax) at least in the term of degree k; then f(a + Ax) — Q(Ax) 
fails to vanish to order k+ 1, and hence 


f(a + Ax) — Q( Ax) f 0(n + 1). 


Proof. The difference S(Ax) = Q( Ax) — P, ha (Ax) is also a polynomial of degree n: 
5(Ax) = oq + aiAx4- \-ai c (Ax) l ' L +a J ( ;+ i(Ax) A+1 4-|-a„(Ax)", 


and f 0 by hypothesis. Therefore 


S( Ax) 
(Ax) i+1 


a o 

(Ax) i+1 


a 1 . . a k , 

- k ~^-^ — + a k+t + ' 


(Ax) 


Ax 


-a n (Ax) 


n—k— 1 


and this becomes infinite as Ax —*• 0 (even if ao = a\ = ■■■ = ak~\ = 0), because 
ak f 0. The error made by using Q(Ax) to approximate f(a + Ax) is 
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f(a + Ax) — Q(Ax) = /(a +Ax) — P„ j0 (Ax) — S(Ax) =R na [ Ax) —S(Ax). 
Therefore 

f(a + Ax) — Q(Ax) R„m(Ax) S(Ax) 

(Ax) i+1 (Ax) t+1 (Ax)* +1 

and, as we have seen, the second term becomes infinite as t —> 0. However, the first 
term remains bounded, because P„, 0 (Ax) = 0(k+ 1) for all k < n. Therefore, the 
two terms together become infinite. It follows from the lemma that 

f(a + Ax) — Q( Ax) f 0(k+ 1), 

and hence f(a + Ax) — g(Ax) f 0(n +1). □ 

Corollary 3.15 If K < n is the degree of the lowest order term where Q and P n a 
differ, then 

f{a+ Ax) - Q{ Ax) = 0{K), f(a + Ax) - Q(Ax) fO(K+ 1). 

Proof. Exercise 3.21. □ 

Finally, we see how a Taylor polynomial becomes a better approximation as its 
degree increases. There is one case where this fails to happen, namely when an 
increase in the degree leaves the polynomial unchanged. For example, with f(x) = 
sinx, the first- and second-degree Taylor polynomials at the origin are 

Pi (Ax) = Ax and ft (Ax) = Ax, 

so P 2 (Ax) will be no better than Pi (Ax) in approximating /(0 + Ax) = sin Ax. The 
problem is that Pi lacks a quadratic term, because /"(()) = 0. The following corol¬ 
lary avoids this case by requiring ( a ) f 0, guaranteeing that the two polynomials 
are indeed different. 

Corollary 3.16 Suppose f^ (a) f 0; then R„, a ( Ax) vanishes to a higher order than 
Rn—l,a (Ax). 


P Mj a(Ax) = Oin + 1) but P„_i ja (Ax) f 0(n + 1). 


Proof. Take Q( Ax) = P„_i j0 (Ax) in the previous corollary; then K = n, because the 
term of degree n in Q = P„_i a is 0, but in P Mj0 it is /M (a) (Ax)"/n! f 0. Therefore 
i,a(Ax) f 0(n+ 1). □ 

We end with a summary of definitions and results about the order of vanishing of 
functions at an arbitrary point. 


Definition 3.7 We say (pit) vanishes to order greater than p at t = a, and write 
(p(t) = o{p) ast^a, if 


lim 


<P(0 

{t-a)P 


= 0 . 


Comparing 

P n -i,a and P nM 


Order of vanishing 
at an arbitrary point 
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3 Approximations 


Taylor polynomials 
in two variables 


Thus, <p(t) = o(p) is an abbreviation for <p(t) = o(p) as / —> 0. We continue to use 
the briefer form unless clarity requires the longer one. 

Definition 3.8 The functions (pit) and t jf(t) vanish to the same order at t = a if 

there are positive constants 8, C\, and C 2 for which 


Ci< 


<p(0 

wif) 


< C 2 for all 0 < |t 


a\ < 5. 


Definition 3.9 We say <p(t) vanishes at least to order p at t = a, and write 
<p(t) = 0{p) as t — » a, if there are positive constants 8, Cforwhich |<p(f)| <C\t — a\ p 
when \t — a\ < 8. Otherwise, we say (pit) fails to vanish to order p at t = a, and 
write (p (t) ± 0{p) ast^a. 

Thus (p{t) = O(p) is an abbreviation for <p(f) = O(p) as t —> 0. 

Definition 3.10 We say <p(t) and \ft(t) agree at least to order p at t = a, and write 
<p(t) = yf(t)+0(p) as t —> a, if <p{t) — \p(t) = O(p) as t —> a. 


3.3 Taylor polynomials in several variables 

We obtain Taylor polynomials for a function of two variables in a natural way from 
the one-variable version. However, the formulas are messy and therefore harder to 
interpret. For example, a polynomial in two or more variables can have several terms 
of the same degree. The collection of terms of a given degree forms a homogeneous 
polynomial. A homogeneous polynomial of degree k in two variables has the general 
form 


Q(x,y) =A 0 x k +A 1 x /c l y + A 2 x k 2 y 2 -I - \-A k _ixf 1 +A k / 

= E W- 

i+j=k 

The Taylor polynomial of degree n for a function z = f(x,y) at (x,y) = (a,b) 
consists of terms that are homogeneous polynomials in Ax =x — a and A y =y — b; 
there is a homogeneous polynomial of every degree between 0 and n. The terms 
involve the binomial coefficients 

k\ k\ / k 

j)~ ~ \k-j 

and partial derivatives of /. For the sake of visual clarity, we use subscripts to write 
the partial derivatives (e.g., d 3 // dx 2 dy = f x i y ). 

Definition 3.11 Suppose all partial derivatives of f (x,y) up to order n exist at 

{x,y) = (a, b); then the Taylor polynomial of degree n for f at ( a,b) is 
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p n,(a,b) (Ax, Ay) = /(a, b) +f x {a, b) Ax + f y (a, b) Ay 

+ 4 (. fxx(a,b ) (At) 2 + 2f xy (a,b) Ax Ay +fyy(a,b) (Ay) 2 ) 

+ ••• + 4 X 

"• i+ j= n \J/ 

Theorem 3.17 (Taylor). If /(x,y) has continuous partial derivatives up to order 
n + 1 on an open set that contains the line segment from (a, b) to {a + Ax, b + Ay), 
then 

f{a + Ax,b + Ay) = P„^ a ,b) (Ax, Ay) + R n ^ b ) (Ax, Ay), 
where R n ,( a ,b) (Ax, Ay) 

= 4 X f” + 1 ) (Ax) i (Ay)- / /" /^y(« + ?Ax,Z) + ?Ay)(l-?)Vf. 
w! ; + y“„+iV 7 ) Jo y 

Proof. The idea is to have the two-variable formula emerge from Taylor’s formula 
for a suitably chosen function of one variable. We can assume (Ax, Ay) f (0,0), for 
otherwise there is nothing to prove. In that case, there is a unique unit vector (a,/3) 
for which (Ax, Ay) = s(a,/3) with s > 0. Let 

F{s) = f(a+sa,b + sj3) = f(a + Ax 1 b + Ay). 

Taylor’s formula for F(s) at s = 0 is 

p{n)( 03 ?"+i /-l 

J F(s)= J F(0)+F'(0)s+--- +-r^” + — / F (n+l \ts){\-t) n dt. 

n\ n\ Jo 

We claim this will turn into Taylor’s fonnula for f(a + Ax, b + Ay) when we express 
each derivative of F in terms of a and /3 and partial derivatives of /. One application 
of the chain rule gives 

F'(s) = fx{a + soc,b + sj3)a + f y {a + sa,b + sfi)fi. 

A second application gives 

F"(s) = fxx(a + sa,b + sfi)a 2 + f^a + saf + s^ap 

+fyx(a + sa,b + sp) pa+fyv(a + sa,b + sfi) ft 2 , 

= fxx(,a + sa,b + sp)a 2 + 2f xy (a + sa,b + sp) ap 
+ fyy(a + soc,b + sf}) p 2 . 

To get a clearer idea of the patterns being generated here, we calculate one more 
derivative. Applying the chain rule to each of the functions fa(a + sa,b + sf3), 
fxy(a + sa,b + sP), and f yy {a + sa 1 b + sfi), we get 


Taylor’s formula 
for functions of 
two variables 


Derivatives of F(s) 
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3 Approximations 


A binomial expansion 


Determining 

(Ax, Ay) 


Simplified notation 


F"'(s) = fxxx(a + sa,b + sfi)a 3 + f xxy (a + sa,b + sfi)a 2 lJ 

~\~ 2 fxyx(® scc,b sf3) ct“/3 ■ 'Ifxyyi^d -\-scc,b-\-s[5^ocl3~ 

+ fyyx(a + sa,b + sp)al3 2 + f m (a + sa,,b + sf$)fi 3 
= fxxx(a + sa,b + sfi)a 3 + 3f XX y(a + sa,b + sfi)a 2 li 

+ 3f xyy (a + sa,b + sfi)a 2 l5 + fyyy(a + sa,b + sf. 3) /3 3 . 

We have used f xyx = and so forth, to combine terms. 

For k = 1,2, and 3, the formula for F^ k \s) is a sum of partial derivatives of / 
in which the numerical coefficients are the binomial coefficients. For an arbitrary k, 
the formula is 


F (k \s)= Y (^fjyjia + sa^ + spja'pF 

i ■ j k ' 

The next step is to determine the factor (0) s k that appears in the Mi term of 

the Taylor polynomial for F(s). We have 

^(0)/= X ( k )f xiy j(a,b)s k a i pJ= Y ( k )f xi yj(a,b)(say(sliy 

*+. j=k \J / /+ j=k \J / 

= Y ff)/x/ 7 ( a >6)(Ax)'(Ay) 7 . 

/+, j=k ' 

These expresssions are equal because s k a q fi p = s ,+ j(x‘fJj = (s'a ! )(s 7 j3 7 ) and, fur¬ 
thermore, sa = Ax, sji = Ay. 

At this point we have found all the terms in the Taylor polynomial P n j a ,b)- The 
final step is to see how the remainder R n t a ,b) emerges from the remainder for F(s). 
That remainder is 

„H+1 p\ 

- - / F^ n+x \ts)(\-t) n dt 

n\ Jo 

= ~i j 0 ^ Z (”y l ^f x >yjia + tsa,b + tsP)s n+1 a t l5 J ^J [\-t) n dt 

= -7 E ( n + l \ sa y( s P) J [ f x iyj(a + tAx,b + tAy)(l-t) n dt 

n\ i+ p n+l \ j J Jo 

= R n,{a,b)(^ A y) 

when we set ( sa )' = (Ax) ! , (s/3 ) 7 = (Ay) 7 . □ 

The large and unwieldy expression for the Taylor polynomial of a function of 
two variables gets even worse when there are more input variables. Before moving 
on to this, we introduce a simplifying notation for the two-variable polynomial that 
makes the /--variable case clearer. 
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The first step is to use vector notation. Thus, we write x = (x,y), a = (a, b), 
Ax = (Ax:, Ay) = x — a. The second step is to express the various partial derivatives 
in vector fashion, as well. A familiar example is the vector differential operator 
“nabla” V = (d / dx, d/dy) that is used for the gradient: grad/ = V/. The operator 
we need is the dot product of Ax and V: 

d d 
Ax • V = Ax — + A y — 
ox ' dy 

This operator produces a certain “mixture” of the partial derivatives of any function 
it operates on: 

(■^ ■ V) /(x) = Ax ^ (x,y) + Ay (x,y ). 

dx dy 

In particular, (Ax • V)/(a) is just the linear homogeneous (i.e., first-degree) part of 
the Taylor polynomial for / at a. 

To get the homogeneous parts of higher degree, just apply the same differential 
operator to its previous output. In other words, compose Ax • V with itself, treating Ax 
and Ay as constants with respect to the partial differential operators d/dx and d / dy. 
The resulting operator involves second derivatives: 


(Ax ■ V ) 2 = (Ax ■ V) o (Ax ■ V) 

/ A d d 
= Ax — + Ay — I o 
\ dx ' dy 


A d A d 

dx dy 


d 2 d 2 d 2 

= ( Ax ) 2 TT + 2Ax A - v 3 - 5 - + (Ay ) 2 ^, 
dx z dx dy dy A 

You should check that (Ax • V) 2 /(a)/2! is the homogeneous quadratic part of the 
Taylor polynomial of f at a. 

Repeated composition produces operators (Ax • V)* involving derivatives of or¬ 
der k for any positive integer k. Because Ax • V is a binomial expression, each such 
power of Ax • V can be expanded as if it were an ordinary binomial: 


(Ax • V)* = 


d d 
Ax — + Av t— 
dx ' dy 


1 

i+j=k 


(Axy(Ayy 


dx‘ dyJ 


Notice that this is indeed a homogeneous polynomial of degree k in the variables Ax 
and Ay. 

In terms of Ax • V, the Taylor polynomial for /(x) at a is just 


P n ,a (Ax) = /(a) + (Ax ■ V)/(a) + 


(Ax ■ V) 2 /( a) 
2 ! 


+ ••• + 


(Ax-V)V(a) 


a much simpler expression than in the original definition (Definition 3.11)! The 
formula for the remainder is simplified in the same way: 


The differential operator 

Ax - V 


Composing Ax - V 
with itself 


A binomial expansion 


Taylor’s formula 
in terms of Ax - V 
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3 Approximations 


Functions of r variables 


A multinomial 
expansion 


Taylor’s formula for 
functions of r variables 


Forms of the remainder 


R„. a(Ax) = -i- [ (Ax• V)" +1 /(a + rAx)(l —t) n dt. 
n\ Jo 


Let us move, finally, to the case of a function of r variables. As we have seen, the 
differential operator Ax • V plays the crucial role in the new notation. When there 
are r variables instead of two, so that 


x= (xi,x 2 ,...,x r ), a = (ai,a 2 ,---,a r ), Ax = x a, 


our differential operator becomes a multinomial, 

d d 

Ax • V = Axi —-1-h Ax,, —-, 

C/X\ ox r 

instead of a binomial. Consequently, we can no longer represent the higher powers 
(Ax • V)* using the binomial expansion. 

However, there is a way to expand multinomials that is exactly analogous to the 
binomial expansion. It uses the multinomial coefficients 


k\ 


\P\ P2 ■■■ PrJ P\ -P2- ■ ■ pA 

the multinomial expansion is 

k 


, P\+P2-\ -I -p r = k; 


(Ax ■ V)* = X x 

P H- Yp r =k \P\ Pr 


(Axi )Tl • • • (Ax r ) Pr 


dx^ { ■ ■ ■ dxy r 


Theorem 3.18 (Taylor). If /(x) has n + 1 continuous derivatives on an open set 
containing the line segment from a to a + Ax, then 

/(a + Ax) = £ ^r(Ax■ V) 4 /(a) +R, ua (Ax), 
k= o K - 


1 

where R n a (Ax) = — / (Ax- V)" +1 /(a + /Ax)(l — t) n dt. □ 

n\ Jo 

The remainder R n . a ( Ax) can be rewritten in different forms, just as in the one- 
variable case. The proofs are the same as the one-variable versions. 

Corollary 3.19 (Lagrange’s form of the remainder) For each Ax « 0, there is a 
9 = 6 (Ax) with 0 < 9 < 1 for which 

R n . a(Ax)= ^ (Ax-V) w+1 /(a+0Ax). □ 

(n+ 1 )! 

The next corollary asserts that the remainder for the Taylor polynomial of degree n 
is approximately the highest-degree homogeneous part of the Taylor polynomial of 
degree n + 1 . 
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Corollary 3.20 (Generalized microscope equation) When Ax « 0, 

fl„, a (Ax)fti * (Ax-V)” +1 /(a). □ 

[n + 1)! 

Also as in the one-variable case, the Taylor polynomial provides the “best fit” to 
a function near a given point, among all polynomials of the same degree. To see this, 
we use the same device we employed in the one-variable case, namely, the order of 
vanishing of the remainder. The definitions are analogous. Let 

t = (fi, • • ■ ,t r ), l|t|| = y/tf-\ -1 -tf, 


and suppose z = <p(t) is a real-valued function that vanishes at the origin: (p(0) = 0. 
You can extend these definitions to the case where tp vanishes at an arbitrary point a, 
as on pages 89-90. 


Definition 3.12 We say the function <p(t) vanishes to order greater than p. and 
write <p(t) = ofp), if 


lim 

t^o 


<P( *) 

lltll p 


= 0 . 


Order of vanishing of a 
multivariable function 


Definition 3.13 We say (p (t) vanishes to order at least p, and write (p (t) = Ofp), if 

there are positive constants 5, Cfor which |<p(t) | < C||t|| p when ||t|| < 5. Othei'wise, 

we say (p (t) fails to vanish to order p, and write (p (t) 7^ Ofp). 


For example, any linear function z = Lft) = m\t\ -\ -f- m r t r vanishes at least to 

order 1: |L(t)| < C||t|j, for some C. The graph of z = C||t|| is a cone, whereas the 
graph of z = |L(t)| resembles a (hyper)plane that has been folded upward along the 
set where L( t) = 0. The cone can always be elongated enough (by increasing C) to 
make the folded hyperplane lie below it. 



Corollary 3.21 R, ua (A\) = 0(n + 1). 


A linear function 
vanishes at least 
to first order 


Proof. The proof of Taylor’s theorem (Theorem 3.17) used the one-variable func¬ 
tion 


F{s) =/(a + su) =/(a + Ax), 


where u is a unit vector, s > 0, and vu = Ax. (The theorem was stated and proved 
when Ax was 2-dimensional, but nothing needs to be changed in higher dimensions.) 
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3 Approximations 


The Taylor polynomial 
provides the “best fit” 


The Taylor polynomial 
of certain products 


In the proof, Taylor’s formula for F(s) at s = 0 became Taylor’s formula for /(x) 
at x = a. In particular, the remainder R„, a (Ax) was just the remainder R„y(s) for 
F(s). We know R„fi(s) vanishes at least to order n + 1 in s, so there are positive 
constants 5 and C for which \R„y ($)| < C|.s|" +1 when |s| < 8. Buts = ||Ax||, so 

|^,a(Ax)| = |tf B , 0 (s)| <C|s| n+1 =C||Ax|r +1 


when ||Ax|| <5. □ 

Thus, /(a + Ax) agrees with its Taylor polynomial P n ^{ Ax) at least up to order 
n + 1 in Ax. There is no other polynomial of degree n for which this is true; according 
to the following theorem (which mimics the one-variable case), the agreement is 
always of lower order. 

Theorem 3.22. Suppose Q{ Ax) is a polynomial of degree n that differs from the 
Taylor polynomial P n , a { Ax) at least in some term of degree k < n; then 

/(a +Ax) - g(Ax) fO{k+ 1). 

Proof. We can use the idea of the proof of the last corollary. Write Ax = vu for a 
suitable 5 > 0 and unit vector u. Then the one-variable function q{s) = Q{Ax) = 
g(su) is a polynomial of degree n. 

Let p n fi{s) = Pn. fsyi)', then p n ,o(s) is the Taylor polynomial of degree n for 
F(s) = /(a + su). Therefore, P/,p(s) and q(s) differ at least in the term of degree k, 
implying 

F(s)-q(s)fO(k+ 1) 

(as functions of s) by Theorem 3.14. Because F(s) = /(a +Ax) and q(s) = Q( Ax), 
we have 

/(a + Ax) -Q(Ax) f 0(k+ 1) 

as functions of Ax. □ 

In certain circumstances, it is possible to construct the Taylor polynomial more 
directly, without evaluating a multitude of partial derviatives. For example, if f(x,y ) 
is a product in which the variables are separated, f{x,y) = g{x)h(y), we can just 
multiply together the Taylor polynomials for the individual factors g and h. To illus¬ 
trate, we construct the 4th-degree polynomial for e x cosy at (x,y) = (0,0) from 

e ' = 1+I+ T + 5T + iT + 0(5) - -,= 14 + £ + 0 (6). 

Now just distribute the terms of cosy over the terms of e A : 
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e*COSj=(l+x+^ + - ( + ||+0(5)) Xl 


i+X+y- 


X 

6 

x 3 

6 

x 3 

6 


- ( l+x+ — + — + — + 0(5) ] x 
+ | 1 ~hx -\ ———I ——f- ——■ H- 0(5) J X 


y 


24 

x 4 

24 


y_ 

24 

-0(5)^ x 0(6) 

„2 „ 2,2 „4 


x" x" x"■ y~ xv" x 2 y2 y’ . N 

= 1+X+ y + _ 6 + 24~7 ^~~ + 24 + 0(5) 


— 1 +X + 


x 2 - y 2 x 3 - 3xy 2 x 4 - 6x 2 y 2 + y 4 


2 ! 


■ + ■ 


3! 


■ + ■ 


4! 


-0(5). 


All of the individual products (e.g., T'y 1 /12) that do not appear explicitly in the 
last two lines have been absorbed into the symbol 0(5) because they vanish at least 
to order 5. You should check that this agrees with the definition of A,(o,o)(x,y) f° r 
e x cosy; see Exercise 3.25. 


The last possibility we need to consider is a nonlinear map f: V p —> R 9 , where 
V p is an open set in R p , 


’*t =/i(vi,...,Vp), 
X2 =/ 2 (vi,...,Vp), 

K x q =fq(vi,...,v p ). 


This is a vector-valued function, x = f(v), and the Taylor polynomial of degree n 
at v = a for f is just the vector of Taylor polynomials for the individual component 
functions f(\). That is, let P i; „^(Av) be the Taylor polynomial of degree n for f(\) 
at v = a. Then the polynomial map P„. a : R p —> R ? , 


P 


«.a 


Xi =Pi ; „, a (Avi,...,Avp), 
X2 =/ > 2 ;n,a(Avi,...,AVp), 


Xq — Pq;n 7 a( Avi,. . . , Av^), 


is the Taylor polynomial for the map f at the point v = a. Likewise, the remainder 
is the vector of the corresponding remainder functions x; = A’, ;; , a (Av). It is the map 
R„. a : V p - R9, where is a suitable open neighborhood of 0: 


Taylor polynomials for 
vector-valued functions 
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3 Approximations 


Taylor’s formula 
with remainder 


R;i,a : < 


x\ =i?i ; „ i a(Avi,...,Av i) ), 
*2 =^ 2 ;n,a(Avi,...,Avp), 


Xq — -^f/;n,a(Avi,... , A Vp). 

Taylor’s formula then holds for the maps themselves: 

f(a + Av) = P„, a (Av)+R„,a(Av). 


We can even describe the order of vanishing of the remainder. Suppose O : T p —> R 9 
is a vector-valued function that vanishes at the origin: 0 ( 0 ) = 0 . 


Order of vanishing of a 
vector-valued function 


Definition 3.14 We say the function O(t) vanishes to order greater than p, and 
write O(t) = o(p), if 


lim 

t^o 


\\*m 

ii*r 


= 0 . 


Definition 3.15 We say O(t) vanishes at least to order p, and write O(t) = 0(p), 

if there are positive constants 8, C for which j|0(t) || < C||t|| p when |jt|| < 8. Other¬ 
wise, we say O(t) fails to vanish to order p. and write O(t) f- Oip). 


Theorem 3.23. R„ ja (Av) = 0(n +1). 

Proof. We must show there are positive numbers <5,C for which 
||R„.a(Av)|| <C||Av|r +1 when ||Av|| < 8. 


Each component of R„, a (Av) is just a real-valued function, so we know it vanishes 
at least to order n + 1 (Taylor’s theorem for real-valued functions, Theorem 3.18, 
and Corollary 3.21). Hence, for each i = 1, q, there are positive numbers 8,.C) 
for which 

|£/;„,a(Av)| <C,||Av||" +1 when ||Av|| < 8,. 

All the inequalities remain true when we take || Av|| < 8, where 8 is the smallest of 
<5 i,...,c5 9 . 

For the magnitude of the vector-valued function R„. a (Av) we have 

||R„. a (Av) || 2 = |J?i ; „, a (Av)| 2 + • • • + |^ /; «, a (Av)| 2 

< Cf ||Av|| 2 (" +1 )-1-hC 2 ||Av|| 2 (" +1 ) 

when 11Av11 < 8. Therefore, if we set C = Vcf-1 -C 2 , then 

||R n .a(Av)||<C||Avr +1 . □ 


The Taylor polynomial For our future work, the Taylor polynomial map of degree 1 is the most impor- 

map of degree 1 tant. In terms of components, it is 
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Pl.a : 


*i = f\ (a) + (a) Avi H-f |=^- (a) Av p , 

(7Vi 


*2 = / 2 (a) + ^(a)Avi + --- + ^-(a)Av p , 

(7Vl (/Vn 


= fq (a) + (a) Avi 4-f (a) Av p . 

ov 1 

The initial constant terms are the components of the vector f(a). The remaining 
terms are linear in Avj,..., Av p ; they are naturally represented by a linear map acting 
on the vector Av: 


dfa(Av) 


|A (a) ...|A w \ 


dv\ 


SVr, 


|A (a) ...|A w 

\I7V1 dv p ) 


( Avi N 


\Av 


Definition 3.16 The derivative of the map f: V 1 ’ —> MR at a is the linear map df a : 
HR that is represented by the q x p matrix with components dfi/dvj{ a). 

In terms of the derivative, the Taylor polynomial map of degree 1 for f at a is 


Pi,a(Av) = f(a) + df a (Av), 


and Taylor’s formula is 


f(a + Av) = f(a) + df a (Av) +0(2). 

In the next chapter we study in detail how df a approximates f near a. 

Here are two examples of Taylor approximations to maps. The first map is the 
polar coordinate change 

fx = rcos0, 

1 y = rsind. 

At the point (r, 9) = (ro, 0) (so Ar = r — r$. Ad = 6), the Taylor polynomial map of 
degree 3 is 


Jx = r o + Ar-^(Ae) 2 -i(Ar)(A0) 2 + O(4), 

[y = r 0 A9 + (Ar) (AG ) - j (A9) 3 + 0(4). 

Notice that the polynomial terms are just the products of r = ro + A r with the familiar 
Taylor polynomials for the cosine and sine functions. 

The second map is 


The derivative of f 


Examples 
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x = u 2 — 3UV 2 , 
y = 3 m 2 v — v 3 . 


3 Approximations 


g: 


The derivative of g at the point a = (a, b) is given by the matrix 

'3a 2 —3b 2 —6 ab 


dg (a,b) = 


3u 2 — 3v 2 —6 uv 

6 uv 3 u 2 — 3v 2 


(«,v)=(a,6) 


6 ab 3 a 2 — 3 b‘ 


The determinant of dg a has the simple form 

det(dga) = (3a 2 — 3b 2 ) 2 + ( 6ab) 2 = 9 a 4 — 18a 2 b 2 + 9b 4 + 36a 2 b 2 = 9(a 2 + b 2 ) 2 ; 


thus dga is invertible for all a f 0. 


Exercises 


3.1. Determine the mean value of each of the following functions on the given 
domain. 

a. f{x) =x" on [ 0 , 1 ], 

b. fix) = sinx on [ 0 , tt] . 

c. f{x,y) = x 2 3-y 2 on x 2 3-y 2 < 1. (Suggestion: Use polar coordinates.) 

3.2. a. Let R be the rectangle [a,b] x [c,d\ in the (x,j)-plane. Determine the coor¬ 

dinates (4,1]) of the point at the center of R. 
b. Show that the mean value of f{x,y) = ax + fly + y on R is the value 
/(4,t]) = a4+/3t]+yof / at the center of the rectangle. 

3.3. Show that the mean value of a linear function on a circular disk in the (x,y)~ 
plane is its value at the center of the disk. 

3.4. Find c in [0,1] so that / x n dx = c n . 

Jo 

3.5. Find c in [0, n\ so that / sinxdx = resin c. 

Jo 

3.6. Assume 0 <a < b. Find c for which 



(b 


a), 


and show that c lies between a and b. 

3.7. Find a point (a,/3) in the unit disk for which 
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JJ (x 2 +y 2 )dxdy = k(oc 2 + j3 2 ). 

x 2 +y 2 < 1 

3.8. a. Let fix) = Vr 2 — x 2 , —r<x<r. Show that 

J f(x)dx = 2rf(c), 

where c = ±ryj 1 — n 2 / 16. (Suggestion: Evaluate the integral and use that 
value to find c.) 

b. Sketch the graph y = f{x) on the interval — r < x < r. In the sketch, mark 
one of the points c and sketch the horizontal graph v = /(c) to make a 
rectangle over the interval [— r, r\. Does this rectangle appear to contain the 
same area as the semicircular graph of f(x) over [— r,r]l Is this what you 
expect? 

3.9. a. Prove the generalized integral law of the mean when the condition g(x) > 0 

on [a, b\ has been changed to g(x) < 0. 

b. Suppose f[x) = g(x) = x. Show that there is no c in the interval [—1,1] for 
which ^ 

J i f(x)g(x) dx = /(c) J ^ g(x) dx. 

Why does this not contradict the generalized integral law of the mean? 

3.10. Prove the law of the mean for double integrals (Theorem 3.7). Why does the 
domain D have to be connected? 

3.11. a. Obtain estimates for the numerical values of \J 102 and \J 120 using the 

Taylor polynomials P„ i ioo(Ax) of degree n = 2, 3, 4, and 5 for /(x) = ^fx 
at a = 100. Use these values to verify that, for each n, the error estimating 
x/I02 is only about 1 /10" +1 times the size of the error estimating -/120. 

b. For each n = 2, 3, 4, and 5, sketch the graph of the remainder function 
y = R„. ioo(Ar) = \/100 + Ax —T’„ i ioo(Ax) on the interval —1 < Ax < 1. 
How does your graph indicate that R„^ ioo(Ax) = 0{n + 1)? 

c. For each n = 2, 3, 4, and 5, there is a C n for which 

R„ao o(Ax) = VlOO + Ax—// 10 o(Ax) « C„i Ax)' I+1 . 

Determine C„ and sketch ?^„.ioo(Ax) and C„(Ax)” +1 together on the same 
axes to indicate that 7?„ joo(Ax) = 0{n+ 1). In each case, take — 1 < Ax< 1. 

3.12. a. Construct the Taylor polynomials P, h \ (Ax) centered ata= 1 for /(x) =lnx; 

take n = 1,2, 3, and 4. 

b. Obtain estimates for In 1.02 and In 1.2 using the four different Taylor poly¬ 
nomials you found in part (a), and determine the error in each of these 
estimates. Is the error that P n \ makes for In 1.02 only about 1 /10" +1 times 
the size of the error the same polynomial makes for In 1.2? Explain. 
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c. For each n = 1,2, 3, and 4, sketch the graph of the function y = R n i (Ax) = 
ln(l + Ax) — P n \ (Ax) on the interval —0.3 < Ax < 0.3. Does your graph 
demonstrate that R n \ (Ax) = 0(n + 1)? How, or why not? 

3.13. Prove the induction step 




^^(Ax)* +1 +^ +1 , a (Ax) 


in the proof of Taylor’s theorem. 

3.14. Prove l’Hopital’s rule in the following form. Suppose f(a) = f(a) = ■■■ = 
f^ n ~^(a) = 0 , g(a) = g'(a ) = • • • = g(' !-1 )(a) = 0 , and either ^ 0 or 

gW (a) ^ 0 (or both); then 



(Suggestion: Use Taylor’s formula with Lagrange’s form of the remainder, for 
both f(a + Ax) andg(a + Ax).) 

3.15. Let <p(f) = t a , 1 < a < 2. Show that <p(t) = o(l) but (p(t) ^ 0(2). 

3.16. Use the fact that e* grows faster than any positive power ofx (i.e., x p /e* —> 0 
asx-^ for any p > 0 ) to show that y/(u) = exp(— \/\u\) vanishes to order 
greater than p for any p > 0. It follows that we can define t//(0) = 0. Sketch 
the graph of t = t/r(w) and determine the image of y/ on the /-axis. 

3.17. Show that the condition <p(/) = o(p) can be expressed in the following way. 
Given any e > 0, there is a 8 > 0 so that 


\t\<8=> m)\<e\t\ p . 


(Suggestion: The fact that cp(t)/t p —» 0 as / —> 0 means that \(p(t)/t p \ can be 
made smaller than any preassigned e > 0 by making |/| sufficiently small, i.e., 
by making |/| < 8 for some suitable 8.) 


Note: Although this formulation of “little oh” may seem more cumbersome, it has 
the advantage of avoiding using a quotient, a useful feature in some of our later work 
(cf. p. 133). This formulation is also more like our definition of “big oh.” 
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3.18. Let (p(t) = — 1/lnt, 0 <t < 1, <p(0) = 0; the graph of (p appears in the mar¬ 
gin. The goal of this exercise is to show that <p(|f|) = — 1 / In |r| vanishes to 
order less than p for any p > 0. This is true if, for a given p > 0 and for ev¬ 
ery 9 sufficiently small, t p < (pit) for every 0 < t < 9 (the contrapositive of 
Exercise 3.17). 

a. Show that t = y/(u) (Exercise 3.16) is invertible on u > 0, and show that 
u = (p{t) is the inverse. 

b. Fix p>0,setq=\/p, and choose 8 > 0 so that \u\ <8 => i//(m) < \u\ q 
(as provided by Exercise 3.17 with any e < 1). Now take any 0 < 9 < V(8) 
and let 0 < t < 9 be arbitrary. Set u = (p(t) and show that t p <u = (p(t). 

3.19. Let f{x) = sjx\ show that 


/ k+l) (*) 


±l-3---(2A:- 1) 
2 k + l X k + 1 / 2 


3.20. In the microscope equation, Ay « f (a) Ax, the nature of the approximation is 
unclear. Show that the microscope equation has the more explicit form Ay = 
f (a) Ax + 0(2); in words, “Ay agrees with f(a)Ax at least up to order 2 
in Ax.” 

3.21. Adapt the proof of Theorem 3.14 to prove Corollary 3.15. 

3.22. The purpose of this exercise is to show that the extent to which a polynomial 
approximates a given function near a given point depends on the extent to 
which it matches the Taylor polynomial constructed at that point (cf. Theo¬ 
rem 3.14 and Corollary 3.15). 

a. Show that P(x) = 1 +x + jx 2 + ^x 3 is the Taylor polynomial of degree 3 
at x = 0 for the function e*. 

b. Sketch the graph ofy = R(x) = e T — P(x) in a small neighborhood of x = 0 
to demonstrate that R(x) = 0(4), as required by Taylor’s theorem. 

c. Sketch the graph ofy = V\ (x) = e v — (1 +x+ |x 2 + |x 3 ) in a small neigh¬ 
borhood of x = 0. Determine the value of p for which V\ (x) = O(p) and 
V x (x)^0(p+\). 

d. Sketch the graph of y = V 2 (x) = e Y — (1 + x + jx 2 + ^x 3 ) in a small neigh¬ 
borhood of x = 0. Determine the value of p for which p 2 (x) = O(p) and 
V 2 (x)^0(p+ 1). 

e. Sketch the graph of y = V^(x) = e* — (1 + l.lx+ \x 2 + ^x 3 ) in a small 
neighborhood of x = 0. Determine the value of p for which V^(x) = O(p) 
and V 3 (x) ± 0(p+ 1). 

f. Sketch the graph ofy = V 4 (x) = e* — (1.1 +x + jx 2 + gx 3 ) in a small neigh¬ 
borhood of x = 0. Determine the value of p for which V 4 (x) = O(p) and 
V 4 (x)^0(p+ 1). 
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3.23. Write the Taylor polynomial of degree 2 for the given function at the given 
point. 

a. e A 'siny at (0,0) d. ln(x 2 A-y 2 ) at (1,0) 

b. cosxcosy at (0 ,tt/ 2) e. xyz at (1,— 2,4) 

c. x 3 — Ix+y 2 at (—1,0) f. 1 — cos0 + ^v 2 at (7T,0) 

3.24. Write the Taylor polynomial of degree 4 for (x 2 A-y 2 ) 2 — (x 2 A-y 2 ) at the point 

(x,y) = (1/2,1/2). 

3.25. Show that the Taylor polynomial of degree 4 for e^cos y at (x,y) = (0,0), as 
obtained from the definition, agrees with the computation done on page 96. 

3.26. Write out in words what “0(p) ■ 0(q) = 0(p + q)" means, and prove it. 

3.27. Construct the Taylor polynomial of degree 2 centered at the point (p, 9, tp) = 
(po, 71 /2,0) for the spherical coordinate change 

x = pcosOcosw. 

. —n < 6 <n, 

s : \ y = P sinO cos®, 

7 ^ Ti -n/2<(p<n/2. 

z = psirup; 


3.28. a. Suppose L : —> R 9 is linear; show L(Au) vanishes at least to first order 

in Au. In fact, show there is a positive number C for which ||L(Au)|| < 
C||Au|| for all Au. 

b. The smallest number C for which this inequality holds is called the norm 
of the linear map L, written |||L|||. It follows that ||L(Au)|| < |||L ||| ||Auj 
for all Au. Show that 


lll L lll = max ||L(Au)| 

Au =1 


c. Suppose the linear map L : R p —> R^ 1 : Au ^ Ax is invertible. Show that 
L and vanish exactly to order 1 in the sense that there are bounding 
constants 0 < ^4 1 < At, 0 < B\ < t >2 for which 


Ai < 


l|L(Au)|| 
11 Au 11 


< A 2 and B\ < 


|L~ 1 (Ax) || 
II Ax|| 


< Bi 


for all Au. Ax / 0. (This is an adaptation of Definition 3.4 to multivariable 
functions.) 

d. Show that we can take B\ = 1 /Ai, Bi= \/A\ in part (b). 





Chapter 4 

The Derivative 


Abstract The derivative of a map is the linear term in its Taylor approximation; it is 
a map itself. Because linear approximations are simpler than those of higher order, 
and because linear maps are easier to visualize than nonlinear ones, the derivative is 
an especially important part of the study of maps. It gives us valuable local informa¬ 
tion. We study the derivative in this chapter, beginning with the familiar connection 
to tangents. 


4.1 Differentiability 


Analytically, a function y = f(x) is differentiable at a point if a certain limit exists; 
geometrically, the graph of the function must have a tangent at that point. When 
there are several input variables, y = f{x \,... ,x p ), the geometric characterization is 
the same—the graph must have a tangent—but the analytic one becomes uncertain: 
Is it enough for the partial derivatives to exist, or must the directional derivatives 
exist in all directions, or is even more necessary? In this section, we introduce the 
derivative map to settle the question and to make a clear connection between the 
analytic and geometric aspects of differentiability. 

According to the usual definition, y = f(x) is differentiable at x = a if 


lim f{a + Ax)-f{a) 
Ax—>0 At 


= /(«)> 


for some finite number f (a) that we then call the derivative of / at a. In that case, 
we can rewrite the limit expression in the form 

lim /(a + Ar)-/(q)~/ / (a)Ax = Q 
Ax—>0 Ax 

This says that the numerator, as a function of Ax, vanishes to order greater than 1 
(cf. Definition 3.3, p. 85). In other words, the usual definition of differentiability is 
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DOI 10.1007/978-1-4419-7332-0 4, © Springer Science+Business Media, LLC 2010 


Differentiability of 

y = f(x 


Differentiability 
in terms of “little oh” 
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equivalent to the following equality involving “little oh”: 

f(a + Ax) = f(a ) + f (a)Ax+o(\). 

values of / values along 

near a tangent line at a 


Differentiability and 
local linearity 


Comparison with 
Taylor’s theorem 


Differentiability for 

z = f{x,y) 



We recognize y = f(a) + f (a) Ax as the formula for the tangent line at x = a, so 
the equation tells us what it means for the graph of / to have a tangent: the gap 
between the graph of / and its tangent line vanishes more rapidly than the horizontal 
displacement from the point of tangency. We can take this as the geometric definition 
of differentiability. Another name for differentiability, understood geometrically, is 
local linearity, under sufficiently high magnification (i.e., for Ax sufficiently small), 
the graph off at x = a is indistinguishable from the linear graph of the tangent there. 

Notice that our new geometric formula for differentiability is similar to Taylor’s 
formula, 

f(a + Ax) = f(a)+f(a)Ax+0(2). 

The difference lies solely in the order of vanishing of the remainder; Taylor’s for¬ 
mula has the stronger condition ^i a (Ax) = 0(2). (For example, t 4 ^ = o(l) but 
t 4 ! 3 f 0(2); see also Exercise 3.15, p. 102.) But the hypothesis that Taylor’s for¬ 
mula rests upon is stronger, too: Taylor’s theorem requires that / have a continuous 
second derivative on an open interval that contains a and a + Ax. However, as we 
have seen, the limit defining the derivative leads us to the formula 

f(a + Ax) = f(a) +f(a)Ax + o(l) 

that involves “little oh” rather than “big oh.” 

Let us move on to the differentiability of z = f(x,y) at (x,y) = (a,b), and ap¬ 
proach it from the geometric point of view. In terms of coordinates A x = x — a, 
Ay=y—b centered at (a, b), an arbitrary plane has a formula that we can write as 

z = c + pAx + q Ay. 

We require the gap f(a + Ax, b + Ay) — (c+p Ax + q Ay) to vanish more rapidly than 
the horizontal displacement \J (Ax ) 2 + (Ay ) 2 to the point of tangency (a, b). 

Definition 4.1 The function z = f(x,y) is differentiable, or locally linear, at (x,y) = 
(a, b) if there are constants c, p, and q for which 


f(a + Ax,b + Ay) — (c + pAx + qAy) = o(l). 
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In that case, the graph of z = c + p Ax + q Ay is the tangent plane to the graph of f 
at the point ( a,b ). 

Theorem 4.1. If z = f(x,y) is differentiable at ( a,b), then both partial derivatives 
exist at (a,b) and the equation of the tangent plane there is 

z = f{a,b)+f x (a,b)Ax + f y (a,b) Ay. 

Proof In terms of the definition, we must show 

c = f{a,b), p=ft(a,b), q=ff(a,b); 

dx dy 

in particular, we must show that the two partial derivatives exist. For a start, the 
expression 

f(a + Ax,b + Ay) — (c + pAx + qAy) 

must vanish when Ax = Ay = 0. This implies c = f(a,b). Now keep Ay = 0 but let 
Ax vary. The hypothesis then becomes 

f(a + Ax,b)~ (f(a, b) + pAx) = o( 1 ), 


and it means (cf. Definition 3.3, p. 85) 


0 = 


lim f{a + Ax, b) - f(a, b) - p Ax 
Ax —>0 Ax 


lim f{q + Ax,b)~ f(a,b) 
Ay— >0 Ax 


Therefore 


= lim f(a + Ax,b)-f(a,b) 
P Ax—>0 Ax 


df, 

dx 


that is, the partial derivative exists and has the value p. The value of q is determined 
in a similar way, by fixing Ax = 0 and letting Ay vary. □ 


The partial derivatives that define the tangent plane are, at the same time, the 
components of the 1x2 matrix that defines the derivative of / at (a,b) (Defini¬ 
tion 3.16, p. 99). The following corollary makes explicit this (natural!) connection 
between differentiability and the derivative. 


Corollary 4.2 If z = /(x,y) is differentiable at ( a,b ), then the derivative d f a .b) '■ 
M 2 —> R exists and 


f{a + Ax,b + Ay)=f(a 1 b) + d f a ^ (Ax, Ay) + o( 1). □ 

A reasonable question to ask at this point is: if both partial derivatives of f(x,y) 
exist at ( a,b ), is / then differentiable at (a,b)l In other words, if the plane 


Derivative of / 


Do partial derivatives 
imply differentiability? 


z = / (a, b)+f x (a,b) Ax + f y (a,b) Ay 
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Counterexample: 
the “manta ray’’ 


A bundle of lines 
through the origin 


No plane is tangent to 
the graph at the origin 


is defined, is it not automatically the tangent plane to the graph of / at (a,b)7 Is it 
not guaranteed that the gap between this plane and the graph of / must vanish to 
higher order than the horizontal displacement (Ax, Ay) from the point (a, b)l 

It fact, the answer is no. Here is an example that illustrates the contrary (a coun¬ 
terexample): 

fo if ( x ,y) = (o,o), 

f( x -y) = { X 2 y . 

—:-v otherwise. 

I x 2 +y 2 



The two figures are different views of the graph ofz = /(x, y)\ it looks vaguely like a 
manta ray swimming along they-axis. The essential thing to note is that the graph is 
made up of a bundle of straight lines through the origin. An easy way to confirm this 
is to put a polar coordinate overlay on the graph. That is, letx = rcos0,y = rsin 0; 
then, away from the origin, 


2 = f(x,y) 



= r cos 2 0sin0. 


When 0 is fixed (as it is along a radial line), this is the straight line z = mr of slope 
m = cos 2 0 sin 0. The Mathematica 5 code that produces the figures makes use of 
this overlay: 


ParametricPlot3D[{r Cosft], r Sin[t], r Cos[t]"2 Sin[t]}, 
{r, 0, 1}, {t, 0, 2 Pi}, PlotPoints -> {10, 49}, 
BoxRatios -> {1, 1, .8}, Axes -> False, Boxed -> False, 

Viewpoint -> {3.500, -0.680, 1.221}] 


Let us now see why no plane is tangent to the graph at the origin. Along the 
radial line 0 = 0 o, the graph is the straight line of slope m = cos 2 0q sin0o- If Oo is 
an integer multiple of n/2, then m = 0 and the radial line lies along an axis. So this 
slope is a partial derivative; we find f x ( 0,0) = f y ( 0,0) = 0. Therefore, if the graph 
were to have a tangent at the origin, Theorem 4.1 would force it to be the (x,y)-plane 
itself: z = 0. In that case, the slope of the graph in any direction at the origin would 
be 0. But the figures make it clear (and the formula m = cos 2 do sin Go confirms) that 
the slope of the radial line in the direction Oo ^ kn/2 is nonzero. The manta ray 
graph has no tangent plane at the origin; the function / is not differentiable there. 
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So the mere existence of the partial derivatives of z = f(x,y) at a point is not 
enough to guarantee that / is differentiable at that point, i.e., that its graph has a 
tangent plane there. But suppose we impose a stronger condition, one requiring that 
all directional derivatives exist. We recall the definition of a directional derivative 
for a function of p variables (where p need not equal 2). 

Definition 4.2 Let u be a unit vector; then the directional derivative of z = f(x) at 
the point x = a in the direction u is 


Ai/(a) 


d 

Jt f{ a + m) 


when the expression on the right exists. 


Let u = e,- = (0,..., 0,1,0,..., 0) (i.e., 1 in the r'th place, 0 elsewhere); the derivative 
in the direction e, is just the usual partial derivative: 

D e,/( a ) = If-W- 


Suppose we now require that the directional derivatives of z = f{x,y) in all direc¬ 
tions (and not just the axis directions) exist at a point. Will this stronger condition 
guarantee that the graph of / has a tangent at that point? In fact, the manta ray is 
still a counterexample. In the direction u = (cos 0, sin0), the directional derivative 
of / at the origin is 

£, (cos0,sin0)/(O J O) = cos 2 0 sine. 

(In the given direction, the graph of / is a straight line of slope cos 2 0 sin 0.) Thus, 
even though all the directional derivatives of / exist at the origin, there is (still) 
no tangent plane. The existence of all directional derivatives of / at a point is not 
enough to guarantee that / is differentiable there. 

Although the existence of directional derivatives does not guarantee differentia¬ 
bility, the converse is true, according to the following theorem. 


Theorem 4.3. Ifz = f(x,y) is differentiable at ( a , b ), then all directional derivatives 
exist at (a,b). In fact, D( ap) f(a,b) = d/ (lM ,)(o!,/3). 

Proof. The proof is probably easier to follow in vector notation. We set (a, /3) = u; 
then, by definition, the directional derivative is 


D n f( a) = lim 

/->o 


/(a + Ai)-/(a) 
t 


Hm d/a (tu) + o( 1) 
f— > 0 t 


We have/(a -Hu) — /(a) = d/ a (fu) + o(l) because / is differentiable at a. But then 


Hm d /a( ?u ) + °( 1 ) 

/-♦o t 


= lim^M 

/->0 t 


+ lim 

t-> 0 


o(l) 

t 


d/a(u) + 0 = d/ a (u). 


Directional derivatives 


Do directional 
derivatives imply 
differentiability? 


Directional derivatives 
from the derivative 
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The gradient vector 


V/ 



(a,p) 


Local linearity 
with level curves 


We have used the linearity of d/ a to write d/ a (tu) = td/ a (u). Also, o(\)/t —> 0 as 
t —> 0, by definition of o (1). □ 

The gradient of z = f(x,y) at (a,b) is the vector grad f{a,b) = Vf(a,b) in R 2 
whose components are f x (a,b) and f y (a,b). Of course these are, at the same time, 
the components of the matrix that represents the derivative d f( a ,b) : R 2 ^ R 1 - Con¬ 
ceptually, V/(a, b) and d f( a ,b) are different; the first is a vector, the second is a linear 
map. However, because matrix multiplication involves the scalar (dot) product, the 
two are connected: 

d f(a,b)(a,P) = (fxfab) f y (a,b)) = V/(a,6)■ (a,/3). 

This connection allows us to express the previous theorem in a way that is probably 
more familiar. 

Corollary 4.4 Ifz = f(x,y) is differentiable at (a, b), then 


D (a,p)f(a,b) = Vf{a,b)-{a,f}). □ 

Let C be the circle that has the vector V/(a, b) as diameter. Then, because the scalar 
V/(fl,fi) • (a,/3) is the perpendicular projection of Vf(a,b) on the line in the direc¬ 
tion of (a,/3), we can realize D^ a ^f as the length of the chord of C that lies in the 
direction (a,/3). If (ce,j8) makes an obtuse angle with V/, extend —(a,/3) and note 
that then D^ a ^f < 0. 

The hypothesis that / is differentiable is crucial in the corollary. If we merely 
know that all the directional derivatives exist (including the partial derivatives), we 
cannot conclude that D/ a p\f(a,b) = Wf(a,b) ■ (a,/3). The manta ray at the origin 
is once again a counterexample: we have 

£(«,/3)/(0,0) = a 2 /3 but V/(0,0) • (a,/3) = (0,0) • (a,/3) = 0. 

Because we used tangents to define differentiability, it is natural to use tangents 
to illustrate local linearity. For example, consider the function f(x,y) = x 2 —y 2 at 
the point (a, b) = (2,-1).The graph ofz = f(x,y) isacurved surface inR 3 and the 
graph of the derivative 

d/( 2 - 1 ) (Ax, Ay) = f x (2, -1)Ax + f y (2 , -1 )Ay = 4Ar + 2Ay 

is a plane. We expect that, under sufficient magnification at the point (x,y,z) = 
(2, — 1 ,/(2, — 1)) in R 3 , the graph of / will become indistinguishable from the tan¬ 
gent plane; cf. Exercise 4.2. 

We, however, take a different approach, comparing instead the level sets of / and 
d /(2 _i) in windows centered at (2,-1) in the (x,y)-plane. The window on the left 
below shows level curves 


/(2 + Ax, — 1 + Ay) —/(2, — 1) = Az, Az= -5,-3.75,..., 5, 
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in the square 1 < x < 3, — 2 < y < 0. By design, the level curve Az = 0 passes 
through the origin of the (Ax, Ay)-window. Obviously / is nonlinear: its level sets 
are curved, and they are unequally spaced. 




Under tenfold magnification (see the right window; spacing between levels has 
been cut by the same factor of 10), the local linearity of /begins to emerge. The level 
curves of / are now essentially straight, parallel, and evenly spaced: the hallmarks 
of a linear function. Thus, at this magnification, / looks linear. We must now just 
check that the apparently linear function we see in this window agrees with the 
derivative, 

d/( 2 _i)(Ax,Ay) = 4Ax + 2Ay. 

The level curves 4Ax + 2Ay = C of the derivative are parallel straight lines with 
common slope Ay/Ax = —2, just as in the microscope window on the right. But this 
is not yet enough; we need to show that a given line represents the same level for / 
and for d/( 2 ,-l)- This is made easier by the fact that / itself has a simple formula: 

z = /(2 + Ax , -1 + Ay) 

= (2 + Ax) 2 — (— 1 + Ay) 2 = 4 + 4Ax + (Ax) 2 — 1 + 2Ay — (Ay) 2 
= 3 + d /(2 _i)(Ax,Ay) + (Ax) 2 - (Ay) 2 . 

Along the diagonals of the window, (Ax) 2 = (Ay) 2 , so 

Az = /(2 + Ax, -1 + Ay) - 3 = d / (2 _!) (Ax, Ay); 

in other words, / and its derivative agree exactly on the diagonals. It follows that, 
everywhere in the right window, a given level curve for / is indistinguishable from 
the level curve for d/p-i) at the same level. 


Local linearity emerges 


Comparing level curves 

of / and d/( 2 _i) 
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4.2 Maps of the plane 


Visualizing maps 

f: R 2 —> R 2 


First example: 
polar coordinates 


Our goal here is to understand what differentiability means geometrically for a map 
of the plane. Suppose f: U 2 —» R 2 has the coordinate form 


f: 


x=f(u,v), 
y = g(u,v). 


Here U 2 is a window of the form \u — a\ < p, \ v — b\ < q (or, more generally, an open 
set in R 2 ; cf. Definition 8.4, p. 277). The derivative of f at the point a = (a, b) is the 
linear map df a : R 2 —» R 2 whose coordinate matrix is 


df (a ,&) = 


ffu(a,b) f,{a,b)\ 
\gu{a,b) g v (a,b)J ’ 


(Definition 3.16, p. 99). The graph of f and the graph of df (fl ^ are both 2- 
dimensional surfaces, but they lie in the 4-dimensional (u, v,x,_y)-space, so we can¬ 
not visualize them directly. In particular, we cannot see how—or whether—the 
graph of the derivative is tangent to the graph of the map. 

We faced this dimension problem with the graph of a linear map of the plane in 
Chapter 2. There, we solved the problem by looking at images instead of graphs; we 
do the same here. Differentiability is then manifested as local linearity: we compare 
the image of f in a microscope window centered at a to the image of the linear 
map df a in that window. 


The polar coordinate change is the map that pulls back Cartesian coordinates to 
polar coordinates: 

(x = rcos0, 

| y = rsin0. 




The map f puts a grid of rays and concentric circles on the (x,T)-plane, correspond¬ 
ing to the rectangular grid 9 = constant and r = constant in the (r, 0)-plane itself. 
By convention, only the positive half of each ray is used; that is, we assume r > 0. 
(In other words, the domain U 2 for f is the open right half-plane.) Sometimes it is 
useful to allow r = 0, as well. This is the 0-axis; the map f collapses it to a single 
point, the origin in the target. 
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According to Taylor’s theorem, any smooth map is approximately a polynomial 
when its domain is restricted to a small enough region. The closeness of the ap¬ 
proximation is directly related to the degree of the polynomial, but even a linear 
polynomial can provide an impressive approximation. Let us focus on the point 
(r,6) = (3, n/6) and see how f becomes approximately linear as it acts on smaller 
and smaller regions centered at this point. In the process we also see that the ap¬ 
proximation is precisely the linear term in the Taylor polynomial of f, the map that 
we call the derivative of f at (r, 6) = (3, n/6) and denote with the symbol df( 3 ^/ 6 ). 


Local behavior of f near 

(r,e) = (3,jr/6) 


n/6 + .4 


6 n/6 


n/6 - .4 


2.6 


Ad 


A r 


3.4 



n/6 + .04 


n/6 


A0 


n/6 - .04 


A r 


3.96 


3.04 



The figure above shows what f does to a a grid of squares in a small window 
centered at the point (r, 6) = (3, n/6). The image is a grid of radial lines and circular 
arcs in another small window centered at the image point (x,y) = (3\/3/2,3/2). To 
describe the action of f in these windows it is natural to use coordinates A r, Ad, Ax, 
and Ay that measure displacements from the center of each window: 


Ar = r — 3, 

Ad = 0 — n/6, 


Ax = x-3v / 3/2, 
Ay = y — 3/2. 


In the figure, there are two pairs of windows at different levels of magnification. In 
the upper pair the grid spacing is 0.1 units and the windows themselves measure 
0.8 units on a side. The radial lines we see in the image window therefore have a 


Views of f in two 
“microscope” windows 
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Near (3, tt/6), f is a 
stretch and rotation 


The derivative df () . 0 ) 


The derivative splits 
into two factors 


separation of A0 = 0.1 radians and the concentric circular arcs have a separation of 
Ar = 0.1 units. At this level of magnification the arcs are still noticeably curved. 

Each lower window is a tenfold magnification of the center of the window above 
it. The radial lines in the image are now only A(9 = 0.01 radians apart; they look 
nearly parallel. The concentric circular arcs are spaced A r = 0.01 units apart and 
likewise appear to be straight and parallel. In this microscopic view, f looks like a 
linear map, because it maps a grid of congruent squares to a grid of (nearly) congru¬ 
ent parallelograms, rectangles, in fact. 

Can we describe f in the lower windows in the fashion of a linear map? The line 
A0 = 0 in the source (the Ar-axis) is just the horizontal line 0 = n/6, so its image 
is the radial line that makes an angle of n/6 radians, or 30°, as it passes through 
the origin (Ax, Ay) = (0,0) of the microscope window in the target. In other words, 
f rotates the Ar-axis by 30°; the figure makes it clear that the whole (Ar, A0)-plane 
undergoes essentially the same rotation. In addition, before f rotates the plane it 
stretches it vertically; by eye, the stretch factor appears to be about 3. 

Now compare this action with the action of the derivative df( 3 , 71 / 6 ). At an arbitrary 
point (r, 0) = (ro, 0 o)> the derivative di^ rQ 0O ) is defined (see p. 112 ) to be the linear 
map 



whose matrix is 

_ fdx/dr dx/d0\ _ /cos 0 o ~>'osin 0 (A 

ho.eo) - \ dy/dr dy/d0) m=M) ~ V sin0 o r ocos 0 o ) ' 

Notice that df(,. Q 6o ) factors neatly into a pair of matrices, a stretch (or strain ) Sy,-,,, 
followed by a rotation Rg 0 : 



Factoring df (n) t g 0 ) means that we can describe its effect in two stages. First, Si /0 
stretches the plane vertically (i.e., in the direction of the A0-axis) by the factor ro; 
then Rg 0 rotates the result by 0q radians. (The coordinate names Aq. Arj in the 
intermediate window are just arbitrary choices.) 
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In particular, we can now conclude that df( 3 K j^ is a threefold vertical stretch 
followed by a 30° rotation. But this is exactly how f itself behaves in a microscope 
window centered at (r, 0) = (3, n/6). This suggests we say that f is locally linear in 
a small neighborhood of (3, n/6). 

The example is leading us to say that a map f : U 2 — * R 2 : u i— ► x will be locally 
linear—or differentiable—at a point u = a if 

Ax = f(a + Au) — f(a) 

differs from a linear function of Au by an amount that vanishes faster than Au. Here 
is a precise definition. 

Definition 4.3 The map x = f(u) is differentiable, or locally linear, at u = a if there 
is a linear map L : R 2 — > R 2 , called the derivative off at a, for which 

f(a + Au) = f(a) +L(Au) +o(l). 

Theorem 4.5. Iff : U 2 —* R 2 is differentiable at u = a, then L = df a . In particular, 
all the partial derivatives appearing in the matrix df a exist. 

Proof. See Chapter 4.4, where the theorem is restated (as Theorem 4.6) for the 
general case f: U p —> RT □ 

The theorem makes it clear that if f is locally linear at a, then its linear approxi¬ 
mation is its derivative df a . Note that if we rewrite the window equation 

Ax = f(a +Au) — f(a) = df a (Au) +o(l) 

without the remainder term o(l), we get an approximation that is, in effect, a new 
form of the microscope equation: 

Ax « df Uo (Au). 

In other words, the microscope equation emerges as a (rather condensed) way of 
expressing the differentiability or local linearity of a map. We have already noted 
the connection between the microscope equation and Taylor’s theorem in Chapter 3 
(p. 83; p. 95). 

Definition 4.4 Iff : U 2 —> JR 2 is differentiable at a, its local area multiplier at a is 
the area multiplier of its derivative df a . 

For the polar coordinate map x = f(r), we find that the area multiplier of df ro 
is r 0 : 


,., f /cos0 O —rosin0 O \ 2 . 2fl 

detdf ro = det^. n0o roCOS0o J=rocos e 0 + r 0 sin 9 0 =r 0 . 

Thus we say that ro is the local area multiplier for f itself at the point ro = (jy. Qq ). 
It is evident in the figure below that the local area multiplier of f varies from point to 


df near (3, tt/6) 


Differentiability and 
local linearity 


The microscope 
equation 


Local area multiplier 
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Local linearity versus 
“looking linear locally” 


Second example: 
a quadratic map 


point and increases with the radius r; our calculations show that the local multiplier 
is exactly r. 


9 


A- 


B 


C 


y 

.f (C) 

\ n /*i *. - 

Mt i) / \ * 






For plane maps f like the polar coordinate map, we have used the notions 

f is arbitrarily close to df a near a and f “looks like ” df a near a 

more or less interchangeably. However, the two are subtly different. The first, of 
course, is what we now call local linearity. The second, however, may not be true if 
d f a fails to be invertible; it is a stronger condition. To help bring into sharper focus 
the distinction between these two notions—and to see how the second condition can 
fail—we analyze a second map. 


Consider the quadratic map f: 1 


, defined by the equations 



Although the action of the polar coordinate change map was immediately evident 
on a global level (i.e., on the entire right half-plane r > 0), the same is not true for 
the quadratic map f. However, the action is not hard to describe; we now show f 
squares the distance of any point from the origin and doubles the angle that point 
makes with the positive horizontal axis. 




Polar coordinate 
overlays 


We can show that f acts this way by translating our formulas for f into polar 
coordinates, because they provide the angles and distances we wish to measure. 
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That is, think of polar coordinates (r,0) as an “overlay” on the (x,y)- plane, and 
introduce the same overlay on the (u,v)-plane using new polar coordinates (p,<p): 
u = p cos <p, v = p sin <p. Then the formulas for f that define x and y in terms of u 
and v translate into expressions for r and 0 in terms of p and (p: 

rcos 9 =x = u 2 — v 2 = p 2 cos 2 (p — p 2 sin 2 (p = p 2 cos2<p, 
rsinP =y = 2 uv = 2p cos (p ■ p sin(p = p 2 sin2(p. 

In other words, r cos 9 = p 2 cos2<p and rsinQ = p 2 sin2<p, so 

r = p 2 and 6 = 2(p. 

Thus, in terms of the polar coordinate overlays on the source and target, f squares 
the distance of a point from the origin (r = p 2 ) and it doubles the angle that that 
point makes with the horizontal (6 = 2 <p). 

The angle-doubling means f “fans out” the upper half-plane v > 0 in the source 
to cover the entire target (x,y)-plane. The lower half-plane v < 0 also covers the 
entire (x,y)-plane, so the source covers the target twice, except for the origin. More 
precisely, let V 2 be the plane minus the origin: V 2 = R 2 \ (0,0). Then f: V 2 — > V 2 
is a 2-1 map. Every point in the target V 2 is the image of exactly two points in the 
source V 2 (that lie 180° apart at the same distance from the origin). The unit circle 
maps to itself. A concentric circle inside the unit circle maps to another one even 
closer to the origin; one outside the unit circle is mapped to another farther from the 
origin. 

With this clear picture of the global behavior of f, it is easy to analyze its local 
behavior near a given point. For example, take the point (u,v) = (v/3/2,1/2). Its 
image is (x,y) = (1/2, v/3/2). These points are 1 unit from the origin (p = r = 1) 
and make angles of <p = 30° = n/ 6 radians and 9 = 60° = nj 3 radians, respectively. 


y 



image line at 60° / 


f doubles angles 
and squares distances 
from the origin 


f is a “double cover” 


Behavior of f near 

(w,v) = (v/3/2,l/2) 


30° line 
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Rotation and 
linear expansion 


The derivative of f 

at (73/2,1/2) 


df (a,6) iS 3 

rotation-dilation matrix 


Conformal maps 


The figure above uses the polar coordinate overlays to show how f acts in micro¬ 
scope windows centered at these two points. In the polar grid in the source window, 
the spacing between adjacent circular arcs is Ap = 1 /36 ~ 0.028 units; the spacing 
between radial line segments is A tp = 1.5° ~ 0.026 radians. Because these numbers 
are nearly equal and relatively small, the grid looks approximately square. 

The target window shows the image of the source grid under the map f. The 
30°-line from the source gives us our bearings. At the macroscopic level, f maps 
it to the 60°-line, so in the microscope window f rotates it by 30°. The entire grid 
is carried along by this action, so f rotates all points in the source by 30°, more 
or less. Obviously, there is also linear expansion we must take into account. The 
image grid is still approximately square; thus f must be close to a uniform dilation. 
Moreover, a single square in the image grid is about the size of a 2 x 2 square in the 
source grid, so the linear expansion factor is about 2 (and the area expansion factor 
is about 4). Therefore, in the microscope windows it appears that f approximately 
doubles all lengths and rotates points by about 30°, or n/6 radians. In other words, f 
approximates the linear map 2 R n / 6 in a small neighborhood of (w,v) = (73/2,1/2). 

We expect, therefore, that the derivative of f at (73/2,1 /2) must equal 2R K / 6 . 
Can we confirm this? First of all, at an arbitrary point (u. v) = ( a,b ), the derivative 
of f is given by the matrix 

_ / dx/du dx/ <9v\ _ (2ci —2b\ _ - /a —b\ 

{a ' h) ~ \dy/du dy/dv) M=(a6) ~ \2b 2a ) ~ \b a )' 


Therefore, 


, f /73/2-l/2\ /cos7r/6-sin7r/6\ 

ai (73/2,i/2) ^ 1/2 73/2 ) V sin7r/6 cos7t/6 J 




so df/^ 21 / 2 ) i s indeed the local linear approximation to f at (73/2,1/2). 

Before we consider the local behavior of f at a second point, we pause to note 
that our formula, above, for the derivative df( a ^ shows that it is a rotation-dilation 
matrix (cf. p. 39 ff). We exclude the special case (a, b) = (0,0), where df (0 0 j is just 
the zero matrix. Thus, assuming (a,b) 7 (0.0), 


df(„,6) = 2 


a —b 
b a 


= 2 7 a 1 + b 2 R 


arctan (b/a) • 


This says that is rotation by 0 = arctan (b/a) followed by a uniform linear 
dilation by the factor 2 \/a 2 + b 2 . The local area multiplier for f at (a, b) is therefore 
4 (a 2 + b 2 ). 

In Euclidean geometry, a rotation-dilation matrix such as df (o ft ), with (a,b) 7 
(0,0), is also known as a similarity transformation. Even when df( a ^) alters lengths 
(i.e., when 2 fa 2 + b 2 7 1), angles remain unchanged.; therefore, a plane figure and 
its image under df (a are similar. A map such as f whose derivative is a similarity 
at each point in an open region is said to be conformal on that region. 
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For our second illustration, let us study the local behavior of f at the point (u, v) = 
(—3/2/4,3/2/4) (so (p,<p) = (3/2,3 tt/ 4)). The derivative is 


df (-3\^/4,3\/2/4) 


Z-3/2/2 —3/2/2\ 
^ 3a/2/2 -3/2/2/ 


— 3f?3^/4- 


Therefore, if f is locally linear at (—3/2/4,3/2/4), we expect f will approximately 
triple all lengths and rotate all points by 3 tt/ 4 radians (i.e., 135°) in a microscope 
window centered at (—3/2/4,3/2/4). 

T 



f 


Am 


f 



x 


Behavior of f near 

(-3/2/4,3/2/4) 


The figure shows the action of f. To describe it, we again use a polar grid overlay 
in the source. At the macroscopic level, f “fans out” the second quadrant to cover the 
third and fourth quadrants. At the microscopic level, the spacing between concentric 
circular arcs in the source grid is Ap = 1/36 « 0.028 units, just as it was in our 
first illustration. However, we have reduced the spacing between radial lines in the 
grid to Arp = 1° = 7r/180 radians, but because p ss 1.5 in the window, the width 
between adjacent rays is about 1.5 x 7T/180 « 0.026 units. (By the definition of 
radian measure, an angle of <p radians at the center of a circle of radius p cuts off 
an arc of length p ■ cp on the circle.) This adjustment keeps the source grid roughly 
square. 

It is evident from the image in the target window that f is, once again, approx¬ 
imately a rotation coupled with a uniform dilation. This time the 135°-line is our 
landmark; f maps it to the 270°-line in the target, rotating all points in the window 
therefore by about 135°. A square in the image grid is about the size of a 3 x 3 
square in the source grid, so the linear dilation factor is about 3. Hence d f is indeed 
the local linear approximation to f at (—3/2/4,3/2/4). 

Our third illustration analyzes the action of f at the origin. Here we finally get to 
see the difference between looking linear and local linearity. At the origin, the local 
action of f is the same as its global action: in any microscope window, no matter how 
small, f doubles angles and squares lengths. But no linear map does this, so f near 



pep 


Locally linear versus 
looking linear locally 
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Local linearity for 
the quadratic map 


At (v/3/2,1/2), 
2S,j/ 6 (Au) dominates 


the origin does not “look like” any linear map. In particular, f does not “look like” 
its derivative df (0 .o) there- Nevertheless, f is locally linear at the origin; we show this 
in order to confirm that f is well-approximated by df( 010 ) there. 

How can this be? How can f be locally linear at a point and not look linear at 
that point? How can a map be “well-approximated” by a linear map and not “look 
like” that linear map? The explanation lies in the definition of local linearity. A map 
f is locally linear at u = a (p. 115) if 

Ax = f(a +Au) — f(a) = df a (Au) +o(l); 

that is, if the difference Ax — df a (Au) vanishes more rapidly than Au. To see that this 
is indeed true for our quadratic map, note first that we can compute Ax = (Ax, Ay) 
exactly: 


At = (a + Au) 2 — (b + Av) 2 — (a 2 — b 2 ) 

= 2a Au — 2bAv + (Au) 2 — (Av) 2 , 

Ay = 2(a + Ax) (b + Av) — 2 ab 
= 2b Au + 2a Av + 2 Ax Ay. 

These window equations take the vector form 

Ax = df a (Au) + f(Au), 

showing us that the remainder term is just f(Au). But f(Au) = 0(2) because f(Au) 
is quadratic, and this, in turn, implies the weaker condition f(Au) = o(l) for local 
linearity. Thus, f is “well-approximated” by the linear map df a near a, for every 
point a, including the origin. But whether f “looks like” its linear approximation df a 
in a microscope window centered at a will depend on the relative sizes of the two 
terms in the formula for Ax. 

For example, in the window centered at a = (v/3/2,1 /2) (i.e., the first window), 
the window equation for f is 


Ax = 2R k / 6 (Au) + f(Au). 

The linear term is the rotation and uniform dilation 2R n / 6 (Au). This linear map is 
invertible, so it vanishes exactly to order 1 (Exercise 3.28, p.104). By contrast, the 
second term is the remainder f(Au) and, as such, vanishes at least to order 2. Thus, 
when Au « 0 (in other words, in the microscope window), the linear term dominates, 
precisely because it vanishes to a lower order in Au. This is why the map f looks 
like its linear approximation 2 R n / 6 near (\/3/2,1 /2). 

The behavior of f in the second window (where a = (—3V2/4,3V2/4)) is en¬ 
tirely similar: the linear term S^jw^Au) is invertible so it again dominates the 
quadratic one f(Au). Thus in the second window f looks like its linear approxima¬ 
tion 3f?3^/ 4 (Au). 
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At the origin, the window equation still expresses the local linearity of f, but it 
has the fundamentally different form 

Ax = df 0 (Au) +0(2) = 0 + f(Au). 

The linear term, which had been dominant in the other windows, here contributes 
nothing. It vanishes to infinite order, meaning it vanishes at least to order p for 
every p> 0. The value of Ax is determined solely by the quadratic term f(Au). This 
is what accounts for the angle-doubling and distance-squaring. By default, f(Au) is 
the dominant term; in fact, it vanishes exactly to order 2 (see Exercise 4.12), so we 
are justified in saying it dominates any map (such as dfo) that vanishes to higher 
order. Thus, in a microscope window centered at the origin, f does not look like 
its linear approximation, because the linear approximation is not the dominant term 
in Ax. Instead, f looks like (indeed, is equal to) the quadratic term f(Au). 

In summary, f will look like its linear approximation when that linear approxi¬ 
mation is invertible, but need not otherwise. At the moment, we are relying only on 
an intuitive undertanding of what it means for one map to “look like” another. We 
make the idea precise in the chapter on inverse maps (Chapter 5), where we say that 
two maps look alike if we can transform one into the other by a coordinate change. 
This is the same approach we took in Chapter 2 for linear maps. In that case, the 
coordinate change also involved finding a certain inverse map. 


4.3 Parametrized surfaces 


For another useful set of examples to illustrate the role of the derivative, we turn to 
surfaces in R 3 given parametrically. Such a surface is the image of a map f: U 2 —> R 3 
of a 2-dimensional region U 2 (in the same way that a parametrized curve is the 
image of a 1-dimensional interval). Our aim is to see how the derivative df u is related 
to the map f near u. 

Our first example is the unit sphere in R 3 , given parametrically as 


f: 


x = cos 9 cos <p, 
y = sin0 cos <p, 
z = sin<p. 


The image is indeed the unit sphere centered at the origin because every image point 
is exactly 1 unit from the origin: 

x 2 +y 2 +Z 2 = cos 2 6 cos 2 <p + sin 2 6 cos 2 (p + sin 2 <p = cos 2 <p + sin 2 (p = 1. 


Because cos 9 and sin 0 have period 2n, it is sufficient to take —n < 9 <n. When 
(p = 0, we have 


At (0,0), 

f(Au) dominates 


f “looks like” df 
if df is invertible 


Parametrizing a surface 


Example: unit sphere 


x = cos0, _y=sin0, z = 0; 
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thus f(0,O) traces out the unit circle in the (x,y)- plane, the equator of the sphere. 
When (p = ±n/ 2, we have x = y = 0 and z = ± 1, the north and south poles of the 
sphere. It follows that f already covers the entire image even if we restrict 9 and (p 
to the rectangular domain U 2 : 


„ n n 

—n < 0 < n, — < (p < —. 
- . 2 _ “ 2 




0 = longitude; 
tp = latitude 


Action of f at (tr/2,0) 


Note how the images of the 6- and tp-axes appear on the sphere, as the equator 
and the prime meridian, respectively. The parameters 0 and (p are evidently just the 
familiar longitude and latitude. The points a and b that are marked on the (9,<p)- 
plane, and their images f(a) and f(b) on the sphere, are the two sites where we 
compare f with its derivative. 

We first view the action of f itself in a microscope window centered at each point, 
and then compare that with the action of the derivative at that point. The source is 

2- dimensional, so a window will still be a small square. The target, however, is 

3- dimensional, so each target window will be a small cube. 

The first point a = (0,<p) = (tc/ 2,0) has its image f(a) on the equator at 90° east 
longitude, a point that has target coordinates (x,y,z) = (0,1,0). In the figure below, 
the microscope window centered at (0, tp) = (n/ 2,0) is a square 0.2 units on a side; 
the target window is a cube of the same dimensions centered at (x,y,z) = (0,1,0). 
Following the figure is Mathematica 5code that produces the image of the (A 9, Atp)- 
plane in the target window. 


A(p 


cp=0 


A 6 


e=ni 2 



ParametricPlot3D[{Cos[u] Sin[v], Sin[u] Sin[v], Cos[v]}, 
{u, Pi/2 - 0.1, Pi/2 + 0.1}, {v, -0.1, 0.1}, 
PlotRange->{{-0.1,0.1}, {0.9,1.1}, {-0.1,0.1}}, 

ViewPoint->{3.103, 2.109, 2.299}, PlotPoints->9] 
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The image is the portion of the sphere that lies in this window; it is nearly flat 
because the window is small. It approximates the plane Ay = 0, that is, the (Ax, Az)- 
plane. It appears that f preserves lengths and angles: the image grid has the same 
size and shape as the source grid. The image of the Atp-axis does not quite coincide 
with the Az-axis, but is tangent to it. Likewise, the image of the A0-axis is tangent 
to the Ax-axis, but has the opposite orientation. 

Let us now determine the action of the derivative df (jr 2 .o) in the same microscope 
window. At an arbitrary point ( 9,(p ), the derivative map df (0 ^ : l 2 -> K 3 : AO i—> 
Ax given by the 3x2 matrix 


df(e,<p) ~ 


f — sin0 cos <p — cos0 sin<p^ 
cos 0 cos <p — sin 0 sin <p 
0 cos<p 


For each (0, <p) (except when cos <p = 0), the image will therefore be a plane in R 3 . 
When (0,<p) = (tt/ 2,0), the map Ax = df(w 2 O )(A0) is 




or just 


Ax = —AO, 
Ay = 0, 

Az = A (p. 


This is relatively easy to interpret. The equation Ay = 0 tells us the image is the 
(Ax, Az)-plane. The Arp-axis is mapped to the Az-axis without stretching, and the 
A0-axis is mapped to the Ax-axis, reversing direction but without stretching. 

Our visual evidence indicates that dfj^.o) is just the “flattening-out” of f in a 
microscope window centered at (7 t/ 2,0). It seems reasonable to say that f is locally 
linear at (n/2,0) and “looks like” its derivative there. 


At the second point b = (0,<p) = (7r/4,7r/3), both f and its derivative are a 
bit more complicated to describe. The image f(b) lies in the northern hemisphere, 
at 60° north latitude and 45° east longitude; its target coordinates are ( x,y,z ) = 
(V2/4,V2/4,v/3/2). 




9= nl\ 


In the figure above, the microscope windows (both the square and the cube) are 
again 0.2 units on a side. The image is nearly flat, and is only about half as wide 
as it is tall. The image of the A0-axis is horizontal; that is, it lies in the (Ax,A_y)- 
plane. The image of the A<p-axis lies in the vertical plane where Ax = Ay. It would 


The target window 
shows a small piece 
of the sphere 


Action of df (ff/2i0 ) 


f looks like df(„/ 2] o) 
near (n/2,0) 


Action of f at (zr/4, zr/3) 
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f does not yet 
look linear 


f under further 
magnification 


Action of df (jt/4iJt/ 3 ) 


seem that we cannot tell by eye the angle between this image and the Az-axis. But 
remember that we are just viewing a small portion of the sphere at 60° north latitude. 
That makes it obvious that the image of the Arp-axis makes an angle of 60° with the 
vertical at the center of the window. 

The sides of the image seem pinched together, more so at the top than at the 
bottom; the grid of latitude and longitude lines is relatively far from rectangular. 
This is only to be expected, though. Away from the equator, longitude lines do 
pinch together toward the poles. In a linear map, parallel lines always have parallel 
images, so we must conclude that f does not look linear, at least at this scale. 

But according to Taylor’s theorem, f is indeed locally linear (everywhere away 
from the poles). We see that better when we magnify the view; the figure below is a 
tenfold magnification over the previous one. 




0= 7T/4 


At this magnification, the quadrilaterals in the image grid now look like congruent 
rectangles, so f now looks like a linear map. In the A<p-direction, lengths are unal¬ 
tered (the image rectangles are as tall as the original squares in the source), but in 
the A 0 -direction, lengths are halved. 

The map Ax = df^/^^jA0) defined by the derivative is 



1 

4 


/-V2-V6\ 

V2 - a /6 

V 0 2 / 



The two vectors 


/Ax\ f-V 2/4\ / Ax\ 

ferl^T b 


(~V 6/4\ 

U / 4 J' 


are the images of the unit vectors on the AO- and Acp-axes, respectively. We can see 
immediately that these image vectors are orthogonal, and you can check that their 
lengths are 1/2 and 1, respectively. Thus, a square grid in the (AO, A<p)-plane has for 
its image a rectangular grid with rectangles exactly half as wide as the squares. The 
image of the A0-axis lies in the (Ax, A_y)-planc because the image vector has Az = 0. 
For a similar reason, the image of the A<p-axis lies in the vertical plane Ax = Ay. 
Moreover, because the dot product of the image vector with the unit vector in the 
Az-direction is 2/4 = 1/2, the image makes an angle of 60° with the Az-axis. 
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Once again we have compelling visual evidence. This time it indicates that 
dff 7 r/ 4 , 7 r/ 3 ) matches f in a sufficiently small microscope window centered at the 
point (n/4,n/3). It seems reasonable to say that f is locally linear at (n/4,n/3) 
and “looks like” its derivative there. 

We noted that unit vectors on the Ad- and Atp-axes are mapped to orthogonal 
vectors in the target that have lengths 1/2 and 1, respectively. Thus, a unit square 
maps to a rectangle with area 1/2; the local area multiplier at (0,<p) = ( n/4,n/3 ) 
appears to be 1/2. In fact, the area multiplier for the linear map df^/ 4 ^/ 3 ): R 2 —> R 3 
is (Theorem 2.25, p. 55) 



\fl _ \/6 

2 

0 ~2 

2 

_ \/2 _V6 


0 1 

+ 

\fl y/6 

+ 

V2 V6 


U 2 


4 4 


4 4 



A similar calculation at ( 9,(p ) = (n/2,0) (using the matrix for df ( 7 r / 2 ; o)) gives a 
local area magnification factor of 


/ 


0 0 
0 1 


2 




VoTTTo= i; 


this agrees with our discussion of the local action of f in the microscope window at 
(tt/2,0). 

These examples lead us to the following definition. 

Definition 4.5 If the surface parametrization f : U 2 — > R 3 is differentiable at a, its 
local area multiplier is the area multiplier of its derivative df a : R 2 —> R J . 

At an arbitrary point (0,<p) in the domain of the sphere parametrization, the local 
area multiplier is cos (p: see the exercises. 

The next example is called a crosscap. It has a simple parametrization f: R 2 —> R 3 
in terms of polynomials defined on the entire plane. 


x = u 1 a = (1,0) 

f : < y = mv, b = (—1,1) 

z=-v 2 . c = (0,0) 




f(a) = (1,0,0) 
f(b) = (-l,-l,-l) 
f(c) = (0,0,0) 



f looks like df (a/4) „ /3) 
near {n/4,n/3) 


Area magnification 

at (7 t/4, tt/3) 


Local area multiplier 


The crosscap 
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Action of df 

at a = (1,0) 


Action of f near 

*=( 1 , 0 ) 


Action of df 
at b = (-1,1) 


The image is a kind of parabolic arch that “crosses through” itself in the way shown 
in the figure. The M-axis is mapped to the x-axis along the ridge of the arch. The 
image of the v-axis folds back on itself along the line of self-intersection; both halves 
map to the negative z-axis. We do a local analysis at three different points. This time, 
though, we first compute the derivative and then compare it to the map itself. 

At an arbitrary point p = {p,q), the derivative df p : M 2 —> R 3 is given by the 3 x 2 
matrix 

1 0 

q p 

0 - 2 q 

At (p, q) = a = (1,0), the derivative df a is the map 




Ax = Am, 
or Av = Av, 
Az = 0. 


This is just the identity map of the (Am, Av)-plane to the (Ax, Ay)-plane in the target. 
All lengths and angles are preserved; the local area magnification factor is therefore 
equal to 1. 

Compare this with the action of f itself in a square microscope window, 0.2 units 
on a side, centered at (p,q) = (1,0). Its image is the portion of the crosscap that 
appears in the small cubical window of the same dimensions, centered at f(a) = 
( 1 , 0 , 0 ). 




u = 1 


Apart from the slight curving (which would become even less noticeable if we in¬ 
creased the magnification), the image is the same size and shape as the source: 
f essentially preserves all lengths and angles in mapping the microscope window to 
the target, so f “looks like” df (1 0 ) near (1,0). 

At (/?, q) = b = (— 1,1), the derivative dfb is the map 



Ax = Am, 
or Ay = Am — Av, 
Az = — 2Av. 
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As always, the image is spanned by the images of the unit vectors in the Am- and 
Av-directions. In this case, the spanning vectors are (1,1,0) and (0,-1,—2) in 
(Ax, Ay, Az)-space. The first lies in the (Ax, Ay)-plane, and the second lies in the 
(Ay, Az)-plane, pointing downward (i.e., in the negative Az-direction). The image of 
a coordinate grid of unit squares in the source consists of congruent parallelograms 
whose sides have lengths yfl and y/5. Locally, areas are tripled (see the exercises). 

The figure above shows the action of f itself in a microscope window centered at 
(p,q) = b = (— 1,1). The source window is a square 0.2 units on a side; the image 
cube is larger, about 0.4 units on a side, so that it can contain the entire image. As 
we see, the image of the Au-axis lies in the (Ax, Ay)-plane, whereas the image of the 
Av-axis lies in the (Ay, Az)-plane, oriented so that the positive Av-axis points down. 
The image coordinate grid appears to consist of congruent parallelograms that are 
taller than they are wide. The same figure serves to represent both df b and f near b; 
it shows that f “looks like” its linear approximation near b = (—1,1). 

At the origin the situation is not so simple. Perhaps this is to be expected, because 
it is the place where the crosscap “crosses” itself. The derivative d f c is the linear map 



Ax = Am, 
or Ay = 0, 
Az = 0. 


The rank of the matrix has dropped to 1. This means the image has dimension 1 
instead of 2; instead of a plane, it has collapsed to a line. Indeed, the equations 
indicate the image is just the Ax-axis. Furthermore, the local area magnification 
factor is equal to 0. 

In the exercises you compute the window equation 

Ax = f(p + Au) - f(p) = dfp(Au) +Ry p (Au) 

for the crosscap map f at an arbitrary point p. Because f is a simple quadratic map, 
you get an explicit (quadratic) formula for the remainder Ri p . At the origin, p = 
c = 0, the formula for Ax reduces to 


Action of f near 

b = (-l,l) 


Action of df 

ate = (0,0) 


Action of f near 

c =(0,0) 
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How f differs from dfo 


f “looks like” df if 
df has maximal rank 


Ax = dfo (Au) + Ri i0 (Au), 


Ax = Au 

+ 0 

= 0(1), 

o 

II 

+ Am Av 

= 0(2), 

Az= 0 

+ -(Av) 2 

= 0(2). 


Thus, in a microscope window centered at x = 0, the values of Ay and Ac are an 
order of magnitude smaller than Ax. And that is what we see in the figure below. 




u = 0 


The image has been squeezed in the Av-direction so that it fits into a narrow tube 
along the Ax-axis. The source and target windows are both 0.2 units on a side, but 
the tube’s dimensions are only 0.01 x 0.01 in the Ay- and Az-directions; they are an 
order of magnitude smaller than the long dimension. 

Does f “look like” dfo in this window? Not quite. In the Ax-direction, the deriva¬ 
tive dfo vanishes only to order 1 but the remainder Ri o vanishes to infinite order. 
The derivative dominates, and f does indeed look like dfo in that direction. But in 
the Ay- and Az-directions, dfo now vanishes to infinite order, but R i o —and thus f 
itself—vanishes only to order 2. The remainder dominates; f therefore looks like 
the remainder in those directions, and not like its derivative. So, even though f is 
“well-approximated” by its derivative at u = 0 (i.e., even though f is differentiable), 
it does not look like its derivative in a microscope window centered there. 

In our study of the quadratic map in the previous section we noted that the 
quadratic map (pp. 116-121) failed to look like its derivative locally at a point where 
the derivative itself failed to be invertible. Invertibility is out of the question here, be¬ 
cause the source and target have different dimensions (the derivative is not a square 
matrix). The proper analogue is maximal rank : 

If df a has maximal rank, f will look like df a near a. 

We investigate this point further in the chapter on implicit functions (Chapter 6). 
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4.4 The chain rule 


In elementary calculus, every formula is assembled from a few simple types of 
functions—think of them as “atoms”—by using arithmetic and composition. We 
calculate the derivative of any such formula by knowing the derivatives of the atoms 
and the rules for differentiating arbitrary sums, products, quotients, and chains (or 
compositions). In this section we develop rules for derivatives of maps that general¬ 
ize the sum, product, and chain rules. As in the one-variable case, the most important 
is the chain rule; with it we show how the derivatives of a map and its inverse are 
related. 

Before considering the differentiation rules, we must say what it means for a map 
between spaces of arbitrary dimension to be differentiable. Our definition is just the 
generalization of the ones we have used in special cases (Definitions 4.1, p. 106 
and 4.3, p. 115); U p is a window of the form | m — a,j <qt,i= 1, • • • ,p. 

Definition 4.6 The map 1': U p —> is differentiable, or locally linear, at u = » if 

there is a linear map L : R p —> called the derivative off at a., for which 

f(a +Au) = f(a) +L(Au) +o(l). 


Theorem 4.6. Suppose f: U p —> is differentiable at u = a; then L = df a . In 

particular, if the component functions off are f( u), then all the partial derivatives 
df/diij(&) exist. 

Proof. Suppose the element in the /th row and yth column of the matrix representing 
L is lij. We show that the partial derivative df/duj{ a) exists and is equal to £,y. 
Let e, = (0,..., 0,1,0,..., 0), the vector in R p with 1 in the yth coordinate and 0 
elsewhere. Then, by definition, 


ditj h-i 0 h 


By hypothesis, f(a + hej) —f( a) = Lfhej) +Rj(hu) = AZ,(e y ) +Rj(hu), where L, 
is the /th component of L and R, is the /th component of the remainder map that is 
represented by the symbol o(l). Therefore, 


OUj /z—>0 h 


= Lj(ej) + lim 


Ri(hej) 


h —>0 


lim hLfe^+Rfhej ) 
h~* o h 


'->]■ 


The final equation holds because Lfej) = fj, and Rfhefj/h —> 0 is simply a con¬ 
sequence of Ri (Au) = o (1). □ 

The theorem shows L is unique. We continue to equate differentiability and local 
linearity: f is differentiable at a if and only if Ax = f(a + Au) — f(a) agrees with the 


Differentiability 
of a map 


Differentiability is 
local linearity 
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Derivative of a sum 


Derivative of 
a scalar product 


linear map df a (Au) to order greater than 1 as Au —> 0. We use this characterization 
repeatedly in the rest of the section. 

Given two maps f, g : U p —* R 9 , their sum and difference, 

(f±g)(u) = f(u)±g(u), 

are themselves maps of the same type: f± g : U p —> R 9 . The following theorem ex¬ 
tends to maps the familiar rule: the derivative of a sum is the sum of the derivatives. 

Theorem 4.7. Iff and g are differentiable on U p , then so are f ± g, and 

d(f±g) a (Au) =df a (Au)±dg a (Au) 


at any point a in U p . 

Proof. The statements are probably intuitively clear, but we prove them to illustrate 
our characterization of a differentiable map. Because f and g are differentiable at a, 
we can write 

(f± g)(a +Au) = f(a +Au) ±g(a +Au) 

= f(a) + df a (Au) + o( 1) ± (g(a) + dg a (Au) +o( 1 )) 

= (f±g)(a) + (df a ±dg a )(Au)+o(l). 

Thus (f±g)(a + Au) — (fig)(a) agrees with the linear map (df a ±dg a )(Au) to 
order greater than 1; by Theorem 4.6, the maps fig are differentiable at a, and 
d(f± g) a = df a idg a . □ 

There are product rules, too, at least when the products themselves are defined 
in meaningful ways. For example, we can compute the ordinary cross-product (or 
vector product) of maps whose target is R 3 , and we can compute the dot product (or 
scalar product) of any maps whose targets have the same dimension. 

Theorem 4.8. Iff, g : U p —> R 9 are differentiable on U p , then so is the scalar prod¬ 
uct f • g : U p —> R, and 

d(f • g) a (Au) = f(a) • dg a (Au) i df a (Au) • g(a). 

Proof. By definition of the scalar product function, and then by the differentiability 
of f and g, we have 

(f ■ g)(a + Au = f(a + Au) ■ g(a +Au) 

= (f(a) + df a (Au) + o(l)) • (g(a) idg a (Au) +o(l)). 

When we expand the right-hand side, we get nine individual scalar terms, five of 
which have o(l) as a factor. Those five therefore all vanish to order greater than 1, 
so we combine them into a single (scalar) symbol o(l): 
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(f ■ g)(a + Au) = f(a) • g(a) + f(a) ■ dg a (Au) + df a (Au) • g(a) 

+ df a (Au) • dg a (Au) + o (1). 

To decide whether f ■ g is differentiable, Theorem 4.6 suggests we rewrite the last 
equation (after setting f(a) ■ g(a) = (f • g)(a)) in the form 

(f ■ g)(a +Au) - (f ■ g)(a) = f(a) • dg a (Au) + df a (Au) • g(a) 

+ df a (Au) • dg a (Au) + o(l). 

Let us now consider, in turn, the first three terms on the right. We know dg a (Au) is 
a linear function of Au, and so is its dot product with the scalar f(a). The second 
term is likewise a linear function of Au. In the third term, each of the two factors 
is linear; by Exercise 3.28 (p. 104), each factor vanishes at least to order 1. That is, 
there are constants Cf and C g such that 

||df a (Au)|| < Cf11Au11, 11dg a (Au)11 <C g ||Au||. 

Therefore, because \A-B\ < ||^4|| ||5|| for any two vectors in R 9 , 

|df a (Au) ■ dg a (Au)| < |jdf a (Au)||||dg a (Au)|| < C f C g ||Au|| 2 , 

implying that df a (Au) • dg a (Au) is 0(2) and hence can be absorbed into the term 
denoted o(l). Thus we see (f ■ g)(a +Au) — (f ■ g)(a) agrees with the linear func¬ 
tion f(a) • dg a (Au) +df a (Au) • g(a) to order greater than 1, so it follows that 
d(f • g) a (Au) = f(a) • dg a (Au) + df a (Au) • g(a). □ 

We turn now to the chain rule. For functions of a single variable, the chain rule 
is commonly written two different ways, corresponding to the two ways we write 
derivatives. Suppose 5 = f(u) and x = <p(s); then x = (p(f(u)) and x therefore de¬ 
pends on u through the action of a new function composed of / and (p. We write 
the composed function as x = (<p o f)(u), and write its derivative in terms of the 
derivatives of the components / and (p as either 

!r = or ( < p°/) , ( fl ) = < p , (/( fl ))/ , ( a )- 

du ds du 

These are two formulations of the chain rule. The first uses the Leibniz notation for 
derivatives; its appeal is that it looks like an ordinary rule for multiplying fractions. 
The second calls attention to the fact that the derivatives of the individual functions 
ip and / must be evaluated at different points. It also reminds us that x depends on u 
one way (namely, through (p of) but on 5 a different way (namely, through (p alone). 
The Leibniz notations dx/du and dx/ds suggest the same thing, though somewhat 
more obliquely. 

Now suppose f: U p —>• R 9 and <p : S p —> R r are differentiable maps with the 
image of f contained in the domain of <p: f ( U p ) C S q . Then the composite (p of : 
U p —»■ R ; ' is defined for all u in U p : (<p o f)(u) = fl>(f(u)). Visually, we can think of 
f and <p as maps coming one after another in a linear “chain” as on the left, below. 


The chain rule for 
one-variable functions 


Diagrams of maps 
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The chain rule 
for maps 


However, to show how they are related to the composite map <p o f, it is natural to 
put the three maps in a triangle: 



UP 


<pof 


R'- 


The chain rule for maps says that the derivative of a composite is the composite 
of its derivatives. We state and prove the theorem below. You should check that the 
proof is just a rigorous version of the following “plausibility argument” based on 
the microscope equations for f, <p, and (p o f. Starting with 


As « df a (Au) and Ax rts d<p f ( a )(As), 


it follows that 

Ax « d<p f(a) (df a (Au)) = (d«p f(a) o df a )(Au). 

But, by definition, Ax « d(<p o f) a (Au); because the linear map in the microscope 
equation is unique, d (<p o f) a = d<jpf (a) o df a . 

Theorem 4.9 (Chain rule). Iff : U p —*• S q is differentiable at a, and (p : S q —> R' is 
differentiable at f(a), then the composite map «p o f: U p —^ R' is differentiable at a 
and 

d(<P o f)a = dfl>f( a ) o df a . 



UP -^ R p -- R r 

<jpof d(«pof) a 

Proof. It is possible to prove this result in terms of the component functions of 
the maps. However, we work directly with the maps themselves. According to our 
characterization of differentiability (and taking into account that d«p f ( a) (df a (Au)) = 
d<p f! a i o df a (Au) by definition), we must therefore show 

( 4 »of)(a + Au)- (<po f)(a) = d<p f(a) (df a (Au)) +o(l). 

This will prove that the derivative of q) o f at a is d<p f(a , o df a . 

To begin, the differentiability of f at a allows us to write 

(q>of)(a + Au) = 4»(f(a +Au)) = «p( f(a) +df a (Au) +o(Au)). 

b As 

We write o(Au) here, instead of just o(l), to stress that this particular remainder 
vanishes (to order greater than 1) with Au. Now use the differentiability of (p at 
f(a) = b to expand the right-hand side. This yields a second remainder o(As) that 
vanishes with As and is thus distinct from o(Au): 
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<p(b) + dq> b (As) + o(As) = ^»(f(a)) +d^» f ( a )(df a (Au) + o(Au)) + o( As) 

= (<p o f)(a) + d<p f ( a ) (df a (Au)) + d<p f ( a ) (o(Au)) +o(As). 

We used the linearity of d<Pf( a \ to split the second term into two. We write the new 
remainder as o(As) to indicate that it is a function of As rather than Au and that, as 
such, it vanishes to order greater than 1 with As. At this stage we have 

(<pof)(a + Au) - (<pof)(a) = d«p f(a) (df a (Au)) +d<p f(a) (o(Au)) +o(As). 

It remains only to show that the last two terms on the right vanish to order greater 
than 1 in Au. 

Lemma 4.1. dq> f ( a )(o(Au)) =o(Au). 

Proof. Because d<p f(a j is a linear map, we know (Exercise 3.28, p. 104) there is a 
positive constant C for which ||dq>f( a )(o(Au))|| < C||o(Au))||. Therefore, 

|| d<p f(a) (o(Au))|| C||o(Au)|| 

lim -----< lim —---— = 0, 

Au—>0 11 All 11 Au—>0 11 All 11 

by the definition of o(Au). Thus dfl> f ( a )(o(Au)) = o(Au). □ 

Lemma 4.2. Let As = df a (Au) + o(Au); then As = O(Au). 

Proof. The first term, df a (Au), is linear, so by Exercise 3.28.a, it vanishes at least 
to order 1 in Au. The second term certainly vanishes at least to order 1 in Au, so 
As = 0(Au). □ 

Lemma 4.3. If As = O(Au), then o(As) = o(Au). 

Proof. We must show ||o(As)||/||Au|| —> 0 as Au —> 0; note that there are different 
variables in the numerator and the denominator. The two variables are linked, how¬ 
ever: As = O(Au). In fact, this hypothesis means As —> 0 as Au —> 0, suggesting that 
we write 

ll°(As)|| ||o(As)|| || As|| 

||Au|| IIAs|| 11Au11 ’ 

Now the second factor on the right is bounded as Au —> 0, because As = O(Au). 
The first factor tends to zero as As —> 0, by definition of o(As). Because As —> 0 
as Au —♦ 0, it appears we have shown that ||o(As)||/||Au|| does indeed tend to 0 as 
Au —> 0. 

But As may be zero for some Au f 0, so the first factor ||o(As)||/||As|| is unde¬ 
fined and the argument fails. We need to avoid quotients here. Fortunately, Exer¬ 
cise 3.17 (p. 102) provides an alternate formulation of “little oh” without quotients. 
The alternate formulation of the condition o(As) = o(Au) that we seek to prove is 
as follows. For any given e > 0, we must be able to find a 8 > 0 so that 

||o(As)|| < e||Au| when ||Au|| < 8. 
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To find 5, first note that As = <9(Au) means, by definition, that there are posi¬ 
tive constants dj and C for which ||As|| < Cj|Au|| when ||Au|| < 5i. The alternate 
formulation of “little oh” implies that, for the £ already given, we can choose (% so 
that 

||o(As)|| < ~||As|| when ||As|| < 8 2 . 

Finally, if we let <5 be the smaller of <5i and 82/C, then ||Au|| < 8 implies first that 
|| As|| < C||Au|| < C5 < 8 2 and consequently that 

||o(As)|| < ^||As|| <e||Au||. □ 

To complete the proof of the theorem, we just need to combine the results of 
Lemmas 4.2 and 4.3 to conclude that if As = df a (Au) +o(Au), theno(As) = o(Au). 

□ 


Example: a chain of 
maps of the plane 


Here is an example that shows how the chain rule works for a pair of maps of the 
plane. The first, f, is the polar coordinate map and the second, (p, is the conformal 
quadratic map; these are Examples 1 and 2 in Chapter 4.2. 


9 9 

X = W“ — V“, 


y = 2 mv. 


u = p cos <p, 
v = p sin<p, 


Their composite is 


<p o f: 


x = p 2 cos 2 (p — p 2 sin 2 (p = p 2 cos2<p, 
y = 2p cos (p- p sin<p = p 2 sin2<p. 


With these formulas for the component functions of <p of, we can compute 
derivative directly, without using the chain rule. At an arbitrary point (p,<p), 
derivative is 


d(<P of )(p,<p) 


/2pcos2<p —2p 2 sin2<p 
\ v 2psin2<p 2p 2 cos2<p 


the 

the 


Let us compare this with the derivative obtained with the chain rule. We start with 
the derivatives of the individual maps <p and f: 


d<P(«,v) 


flu —2v\ f cos (p — psin<p\ 

^2v 2 u J’ < ' p ' cp ' ) ysincp pcoscp J ' 


These have been evaluated at arbitrary points in the domains of the maps. But, in 
the chain rule, d<p must be evaluated at f(p, <p) = (p cos<p,p sin<p). Thus, 





U=P COS (p 

v=p sin (p 


/2pcos<p —2psin^ 
^psinfp 2pcos^ 


and the matrix product we seek is 
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°df(p i<p ) 


2pcosrp — 2 p sirup 
2 p sirup 2pcosq> 


cosrp —p sirup 
sirup pcoscp 


2p cos 2 <p — 2p sin 2 (p — 2 p 2 cos <p sin <p — 2 p 2 sin <p cos (p 

2 p sin <p cos rp + 2p cos rp sirup —2p 2 sin 2 rp + 2p 2 cos 2 <p 


2pcos2<p —2p 2 sin2<p 
2psin2<p 2 p 2 cos2cp 


= d(<p°f) ( p, (p) . 


The chain rule evidently holds in this case. 

For a second example, we consider two maps of the plane defined by arbitrary 
component functions: 


Example: arbitrary 
maps of the plane 


f x=tp(s,t), 
\y= V(s,t), 


f: 


s = f(u,v), 
t=g{u,v). 


The derivatives of these maps are the matrices 


d< P(s,r) = 


/drp_ dcp_\ 

ds dt 


dip dip 
ds dt 


df(«, v ) = 


/V 

du 




dg 

du 


dg 

dv 


whose product is the derivative of the composite map <p o f: 

/ d(p_d£^d(p_dg drp_d£ 

d(<P°f)( u ,v) = 


ds du dt du ds dv 
dipdf dipdg dipdf 


+ 


V ds du ' dt du ds dv 
When we write out the components of the composite map, 


+ 


dcp_dg\ 
dt dv 
chpdg 
dt dv) 


»of: 


\ x = <p(f(u,v),g(u,v)), 
I y=v(f(u,v),g(u,v)), 


we express x and y directly as functions of u and v; if we use dx/du, and so forth, 
to denote the partial derivatives of these functions, then 


d(<P of )(H,v) 


/ dx 

dx\ 

du 

Tv 

dy 

dy_ 

\du 

dv) 


We now have two formulas for the derivative d(«p o f) ^ ^. Together they give us the 
chain rule for the individual component functions: 
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Chain rule for 
component functions 


Derivative 
of the inverse 


Local orientation and 
volume magnification 


dx 

du 

dy 

du 


d(p_d£ + d(p_dg 
ds du dt du ' 
dydf | dydg 
ds du dt du' 


dx 

dv 

dy 

dv 


d<p_d£ + d(p_dg 

ds dv dt dv ' 
dydf | dxpdg 
ds dv dt dv 


There are clear patterns in these four equations that allow us to see what form the 
component derivatives will take in the general case. We start with two variables 
( u and v) that first determine two others (s and t) directly and then two more {x 
and y) indirectly. The partial derivative of x with respect to u, for example, must 
have terms that take into account how x varies with u via s (viz. dtp/ds ■ df /du) 
and via t ( dtp/dt ■ dg/du)). 

In the general case, p variables (u\, ..., u p ) first determine the values of q new 
variables (s\, ...,s q ) directly, and then r additional variables (x \,..., x r ) indirectly: 


Xk = <Pk{sh ■■■,s q ) and sj = fj(u\,... ,u p ), 


for k = I ..... r and j = \....,q. Thus, a partial derivative of Xk, for example, must 
take into account how Xk varies with each w, via each of the q intermediate vari¬ 
ables sj : 

dxk^dtpkc/ft _ dtfocff 

duj ds\ diij ds q dui ’ ’ 

The following theorem summarizes this discussion, with the single variable y re¬ 
placing the various x \,..., x r . 


Theorem 4.10. Suppose the functions y = <p(si ,... ,s q ) and Sj = u p ), with 

j = 1,..., q, are all differentiable; then 


dy_ _ y ■_ , 

diq “| dsj du; ’ ’ ^ 


□ 


The following corollary of the chain rule says that the derivative of the inverse 
(of a given map) is the inverse of the derivative (of that map). 

Corollary 4.11 Suppose f: U n - ■ S n is invertible, and f 1 : S n — > U n is its inverse. 
Suppose that both f and f _1 are differentiable; then 

(df u ) _1 = d(r 1 ) f(u) . 

Proof. Let I : U n —> U n be the identity map, I(u) = u. Then f 1 o f = I; by the chain 
rule 

/ = dI u = d(r 1 ) f(u) odf u , 

where I is the linear map represented by the n x n identity matrix. The equation 
implies that d(f _1 )f( u) is the inverse of df u . □ 

The corollary focuses our attention on maps f :£/"—> S' 1 whose source and target 
have the same dimension. We have already studied some examples when n = 2 in the 
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second section of this chapter; in particular, we saw we could use the area multiplier 
of the derivative to assign a local area multiplier (p. 115) to the map itself at each 
point. The following definition carries these ideas over to higher dimensions. 

Definition 4.7 Suppose f : U n —> JR” is differentiable; the local volume multiplier 
oft at a, written </f(a), is detdf a , the volume multiplier of the derivative off at a. 
Also, f is orientation-preserving or reversing at a according as its derivative df a is 
orientation-preserving or reversing. 

The chain rule implies that the local volume multiplier of a composite is the product 
of their individual multipliers, as we would expect. 

Corollary 4.12 Iff : U" —> S" and (p : .S'" ■ M" are differentiable, then 

J<p of(a) =^»(f(a)) Jf(a). 

Proof. The proof is just a consequence of the fact that the determinant of a product 
is the product of the determinants: 

•Vf(a) = detd (<p o f) a = det(dq» f(a ) o df a ) 

= detd<p f ( a ) detdf a =J v (f(a)) Jf(a). □ 


The traditional name for Jf(a) is the Jacobian; hence the letter In this con¬ 
text, the matrix df a is the Jacobian matrix (the Jacobian itself is always the deter¬ 
minant). The Jacobian plays a central role in multiple integrals. We show it is the 
analogue of the factor (p'(s) that appears in the transformation dx = (p'(s) ds of dif¬ 
ferentials (pp. 3-5). For that reason, we write it another way (also traditional) that 
suggests the connection with derivatives more directly. To illustrate, let 


f: 


x = f(u,v), 
y = g(u,v); 


then our alternate notation for the Jacobian of f is 


J(u,v) 


d(x,y) 

d(u,v) 


d(f,g) 

d(u,v) 


Here we write J(u,v) without the subscript for the map f. This is frequently done, 
and it directs attention to the Jacobian as a function of the input variables. The 
second and third expressions are the more common ones; they remind us that the 
Jacobian involves partial derivatives. The Jacobian of the polar coordinate map is 


d{x,y) 


— rcosQ T—rcosO 
dr de 


d(r,e) 


—— r sin 0 —r sind 
dr d6 


cosO —rsind 
sind rcosd 


Volume magnification 
in a chain 


The Jacobian 


Jacobian notation 
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Chain rule 
for Jacobians 


This agrees with our original determination of the local area multiplier for polar 
coordinates (p. 115). 

Definition 4.8 Suppose fM" is differentiable and has component functions 


Xi=f(ui,...,u n ), i= 


Then the Jacobian off is the determinant 




d(ui,...,u n ) 




= detdf u . 


The following are restatements of Corollaries 4.11 and 4.12 using Jacobian notation. 
Because they deal with inverses and with Jacobians, it becomes practical to replace 
function names by names of output variables (e.g., to replace s,- = f(ui,..., u n ) by 
Si=Si(ui,...,U„)). 

Corollary 4.13 If a map and its inverse are both differentiable, 


! ^1 — *51 (l/l,... , U n f — U \ (*51,. . . , Snf 

Sn — $n (ttl, • • • , uf) j ^ U n — ll n (s\ ,. .. ,S n ), 

then 

d(t/l, ■■■,«„) _ 1 

d(si,...,s n ) d(s\,...,s n ) ' 

Corollary 4.14 If the following are differentiable, 


<P ■ 


x\ X\ (^i,..., sf ), 



! ^i (ni,..., n«), 

s n = s n (u \,... } u n f 


□ 


then 

d(xi,...,x„) __ d(xi,...,x„) d(s\,...,s„) 
d(ui,...,u n ) d(s\,...,s n ) d(u\,...,u n )' 

These results obviously remind us of the one-variable cases: if u = u(s) is the inverse 
of s = s(u), and x = x(s), then 


du 

ds 


1 

ds/du 


dx 

and — = 
du 


dx ds 
ds du 


Local area multiplier 
on a surface patch 


Although Jacobians are determinants and consequently involve an equal number 
of input and output variables, they can appear in other circumstances. For example, 
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the parametrization of a surface patch, f: U 2 —> M 3 , involves three functions of two 
real variables: 

{ x = f(u,v), 

y = g{u,v), 

z = h(u,v). 

The linear map df a : R 2 —> R 3 has an area magnification factor that is a kind of 
“Pythagorean formula” (Theorem 2.25, p. 55). The formula involves the 2x2 mi¬ 
nors of df a , which can be written in a simple and direct way using Jacobians; the 
result is given in the following definition. 

Definition 4.9 The local area multiplier for the parametrized surface patch f : 
U 2 — * R 3 : (m,v) i—s- (x,y,z) is 


M(u,v) 


j\ d{y,z)' 

2 

d(z,x) 

2 

'd(x,y)' 



d(u,v) 


d(u,v) 


For example, the crosscap we analyzed earlier (pp. 126-128) has the parametri¬ 
zation 

! x = u, f\ o \ 

y = mv, df u = I v u I , 

z=-v 2 , \0-2vJ 

so the local area multiplier is 


V u 

2 

0 — 2v 

2 

1 

0 

1 0 —2v 

+ 

1 0 

+ 

V 

U 


\J 4v 4 + 4v 2 + m 2 . 


At (m,v) = (1,0), the multiplier is 1. This agrees with what we saw earlier: near 
(1,0), f preserves areas. At (m, v) = (— 1,1), the multiplier is 3. This too agrees with 
our earlier analysis: near (— 1,1), f triples areas. 


The chain rule gives us a way to prove the mean-value theorem for maps of the 
form f: U p —> RT The mean-value theorem says that, for any two points a and b in 
Uf 

||f(b) —f(a)||<M||b —a||, 

where M is a bound on the size of the derivative of f at points along the line from a 
to b. We need to establish what the “size” of the derivative is. 

In Theorem 3.8 and the discussion preceding it (pp. 76-77), we had q = 1. 
Therefore we were able to identify the derivative with a vector and the size of the 
derivative with the magnitude of that vector. When q > 2, the derivative is a more 
general linear map; to measure its size, we use its norm (see Exercise 3.28.b, p. 104). 
The norm of df u is 

|||df u ||| = max ||df u (Av)||; 

11 Av || = 1 


Example: the crosscap 


A mean-value theorem 
for maps 


The norm of 
a derivative 
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Write the difference 
as an integral 


we have ||df„(Av)|| < |||df u ||| ||Av|| for all Av in R p . The norm of a linear map is the 
largest amount by which it stretches any vector. 

Theorem 4.15 (Mean-value theorem). Suppose the map f: U p —> R 9 is continu¬ 
ously differentiable and U p contains every point on the line segment from a to b. 
Then 

ll f (b) — f(a)|| < max|||df u ||| ||b-a||, 

U 

where the maximum is taken over all points u on the line from a to b. 

Proof. We begin by constructing the analogue of the error formula 

f f (a + tAx) Axdt = f(a +Ax) — f(a) 

Jo 

with which we began the discussion of Taylor’s theorem (cf. pp. 78-79). Set Au = 
b a; then u(t) = a + f Au, 0 < t < 1, is the line segment from a to b. All the points 
on this segment are also in U p ; therefore the map 


(p(t) = f(u(t)) = f(a + tAu) 


is continuously differentiable on [0,1]. By the chain rule, 

(pft) = d <p, = df uW (du,) = df a+ ; Au (Au); 
we have used du, = u'(f) = Au. Thus 


[ df a+/A u(Au )dt= [ q>'(t)dt = (p(l)-<p(0) 

Jo Jo 

= f(a +Au) — f(a) = f(b) — f(a). 


This is, in fact, just Taylor’s formula with remainder in degree 0 for the map f. It 
implies 


l|f(b) — f(a) || = 

Because Au = b a is fixed, we have 


[ df a+ , Au (Au )dt < [ ||df a+ /Au(Au)|| dt. 

Jo Jo 


||df a+ ,Au(Au)|| < max ||df a+/A u(Au)|| < max |||df a+(A u III ||Au| 
0<<<1 0<(<1 


The right-hand side is independent of t, so (with u = a + fAu) 


l|f(b) — f(a)|| < max |||df a+/Au ||| ||Au|| < max |||df u ||| ||b-a||. □ 
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Exercises 

4.1. Determine the derivative of the given function z = f(x,y) at the given point 
(x,y) — ( a,b ). Write the derivative as a linear function of the variables Ax = 
x — a,Ay = y — b. 

a- f(x,y) = 7x-3y + 9,(a,b) = (4,5). 

b. f{x,y) = 1 — cosx+y 2 /2, (a,b) = (0,1). 

c. f(x,y) = arctan (y/x), ( a,b ) = (4,-3). 

d. f(x,y ) = ax 2 +2fixy+ yy 2 + Sx +£y+K, ( a,b ) arbitrary (and a, fi,y, 8 , 
e, k are all constants). 

4.2. a. Let f{x,y) = x 2 — y 2 . Plot the graph of z = f(x,y) in a window centered at 

(x,y) = (2, — 1), making the window small enough for the graph to appear 
to be a flat plane. 

b. Confirm that the plane appearing in (a) is the graph of the derivative 
d/( 2 _i). Because the derivative is expressed in terms of the displacement 
variables Ax = x — 2, Ay = y + 1, it is necessary to include the constant 
term /(2, — 1) = 3 in the equation for the second graph. 

c. Construct a contour plot of f(x,y) in two windows centered at (x,y) = 
(2,-1). Make the first window 2.0 units on a side, and make the sec¬ 
ond 0.2. Confirm that / changes its appearance from nonlinear to linear 
from the first to the second window, and confirm that the level curves of / 
are indistinguishable from the level curves of its derivative there. 

4.3. a. Determine the equation of the tangent plane to the graph of z = f(x,y) = 

sinxsiny at the point (x,y) = (n/3,—n/2). 

b. Sketch, together, the graph of / and this tangent plane over the square 

0 < x < n, —n <y < 0. 

c. Select a smaller square centered at (tc/3, — n/2) on which the graph of / 
and this tangent plane become indistinguishable; sketch the two surfaces 
over that square. 

d. Sketch together contour plots of / and the derivative of / at (n/3,—n/2) 
on the square 0 < x < k, — n <y < 0. Sketch them again in the smaller 
square you selected in part (c). 

4.4. a. Explain what cost — 1 = o(l) means, and then show that it is true. (Sug¬ 

gestion: Use l’HopitaTs rule.) 

b. Is sin t = o(l) true? Explain. 

c. Explain what sint — t = o{ 2) means, and then show that it is true. Is it true 
that sinf — t = o(3)? Explain. Is sin/ — / = 0(3) true? Explain. 

4.5. Show f(x) = x 2 sin( 1 /x) is differentiable at x = 0, that f( 0) = 0, and even 
that /(Ax) = /(0)+ f'(0)Ax + O(2). Show that, nevertheless, f{0) does not 


142 


4 The Derivative 


exist. Moreover, show that f{x) is not a continuous function: Ifx„ = 1/(2 nn), 
then f{x„) = 1; however ,f(x n ) 7 4/'(0) even though x„ —► 0. 

4.6. Let z = f(x,y) be the “manta ray” function (pp. 108-109). Show analyti¬ 
cally that z = 0 is not the tangent plane to / at the origin by showing the 
gap /(Ax, Ay) —/(0,0) does not vanish to order greater than 1; that is, show 
directly (cf. Definition 3.14, p. 98) that the ratio 


/(Ax,Ay) —/(0,0) _ (Ax) 2 Ay/((Ax) 2 + (Ay) 2 ) 
ll( A AAy)|| yj {Ax) 2 + {Ay) 2 


does not have the limit 0 as (Ax, Ay) —> (0,0). 


4.7. Let /(0,0) = 0, f{x,y) = ' for (x,y) / (0,0). 

\A 2 +y 2 


a. Sketch the graph of z = f{x,y) near the origin. Use a polar coordinate 
overlay to clarify the picture. 

b. Show that the partial derivatives f x (0.0) and/ v (0,0) exist, and determine 
their values. 

c. Show that the directional derivative D u f{ 0,0) does not exist if u is not an 
axis direction. Explain this result in terms of the graph of /. 

d. Conclude that /, like the “manta ray” counterexample, fails to be differ¬ 
entiable at the origin. (In fact, for both this function and the “manta ray,” 
f(x,y) = 0(1) is true but f(x,y) =o{ 1) is false.) 


4.8. Let /(0,0) = 0, f(x,y) = 2 for {x,y) / (0,0). 

x “f - y 


a. Sketch the graph of z = f{x,y) near the origin. (A polar coordinate overlay 
is not as helpful here; it does not simplify our view of the graph.) 

b. Show that the directional derivative D u f{ 0,0) exists and equals 0 in every 
direction u. (In particular, f x { 0,0) = f y { 0,0) = 0; therefore, if / were dif¬ 
ferentiable at (0,0), the tangent plane to its graph at the origin would be 
the (x,y)-plane.) 

c. Compute the partial derivatives f x {x,y) and f y {x,y) at any arbitrary point, 
and show they are not continuous at (x,y) = (0,0). 

d. Add to your sketch in part (a) the curve z = f{x,x 2 ) in the graph of / 
that lies over the parabola v = x 2 . Show that z = x/2 along the parabola, 
implying that / vanishes exactly to order 1 on the parabola. Conclude that 
/ cannot be differentiable at the origin. 

4.9. Let f: R 2 —> R 2 and g : R 2 —» R 2 be given by 
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a. Determine the derivative df u at an arbitrary point u = (u. v). Does the 
derivative depend on (w, v)? How is df u related to f? 

b. Determine dg x and then use the (components of the) substitution x = f(u) 
to express the derivative in terms of u: dg x = dg f ( u) . 

c. Compute the components of the composite map h = go f. That is, deter¬ 
mine r and 5 in terms of u and v; h(u) = g(f(u)). 

d. Determine dh u and verify that dh u = dg f ( u ) ■ df u . (This is the chain rule; 
see Theorem 4.9, p. 132.) 

4.10. Let f: U 2 —■» R 2 be the polar coordinate map, and let D 1 : M 2 —> U 2 be its 

inverse: _ 

Ix = rcos0, i _ {r = \Jx 2 +y 2 , 

1 y = r sin 9 , 1 9 = arctan(y/x). 

a. Compute the matrix of the derivative df x 1 at an arbitrary point x = (x,y). 

b. Compute the inverse matrix (df r ) _1 at an arbitrary point r = (r,0). 

c. Use the coordinate change x = f(r) to express df x 1 in terms of r: df x 1 = 
dfjA. Then verify that (df r ) 1 = dfLi. (The derivatives of inverse maps 
are themselves inverses of each other. To see this it is first necessary, how¬ 
ever, to express them in terms of the same variables. See Theorem 4.11, 
p. 136.) 

d. Compute the determinants 


detdf x 1 and detdf r , 

and show that they are reciprocals. Use an appropriate change of variables 
on one of the expressions to compare the two. 


4.11. Let x = f(r) be the polar coordinate map of the previous exercise, and let 
u = g(x) be the map 


g: 


it = x 1 +y z , 
v = x 2 —y 2 . 


a. Determine dg x at an arbitrary point x = (x,y); then use the coordinate 
change x = f(r) to express the derivative in terms of r = (r. 0 ): dg f ( r) . 

b. Compute the components of the composite map h = g o f. That is, deter¬ 
mine u and v in terms of r and 0; u = h(r) = g(f(r)). 

c. Verify that 

2 r 0 

2rcos20 —r 2 sin20 

and also verify that dh r = dg f ( r ) • df r . (This is another instance of the chain 
rule.) 
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In Exercises 4.12-4.20, f: R 2 —> R 2 is the quadratic map discussed in the text: 


x = u 2 — v 2 , 
y — 2 uv. 


4.12. Compute ||f(Au)|| 2 to show that ||f(Au)|| = ||Auj| 2 . Note: this means that 
f(Au) vanishes exactly to order 2 (by the extension of Definition 3.4 used 
in Exercise 3.28). That is, there are positive constants C\, C 2 for which 


Ci < 


l|f(Au)| 

||Au|| 2 


<C 2 


for all Au ^ 0. (It is evident we can take C'\ = C 2 =1.) 

4.13. Let Zl 2 = { (x,t) I y > 0} be the upper half-plane, and let g + : U 2 
map 


■ be the 


y/x 2 +y 2 +x 


g+ : 


\J x 2 +y 2 —: 


a. Show that f(g + (x)) = x for all x in Zl 2 . In other words, g + is a (partial) 
inverse for f. 

b. Describe the action of g + in terms of polar coordinate overlays on the 
source and target (cf. pp. 116-121). In other words, describe what happens 
to the angle a point makes with the positive horizontal axis change, and 
what happens to its distance from the origin change. 

c. Describe the image of g+('Zi 2 ) 


4.14. Show that g+(x) (Exercise 4.13) can be extended to the two sides of the x-axis 
(y = 0) as follows: 


U = y/i, 

v = 0, 


if x > 0; 


u = 0, 

v=VW\, 


ifx < 0. 


What, therefore, is the image of the x-axis under this extension of g+? 

4.15. a. Compute d(g + ) x , where g+is the map of Exercise 4.13. 

b. Use the coordinate change x = u 2 — v 2 , y = 2uv (provided by the inverse 
map f) to express d(g + ) x in terms of u and v, giving d(g + )f( u) . 

c. Verify that d(g ; )fr UJ is the inverse of df u . 

4.16. The object of this exercise is to study the action of g, (Exercise 4.13) in a 
microscope window centered at xq = (1/2, \/3/2). 

a. Determine the center g+(xo) of the target window. 
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b. Explain why g + maps the 60°-line in the window at xo to the 30°-line in 
the target window. (Suggestion: Use results from Exercise 4.13.) 

c. Show that d(g+) Xo = A Rg, for certain A > 0 and 9 < 0; Rg is rotation by 9 
radians. What are the values of A and 0? Does g+ “look like” its derivative 
d(g+) Xo in this microscope window? Explain, in terms of what you know 
about the action of g+. 


4.17. Let L 2 = { (x,y) | < 0} be the lower half-plane, and let g_ : L 2 

map 


be the 


\Jx 2 +y 2 +x 


g : 


\Jx 2 +y 2 


Note that g differs from g + only in the sign of u. 


a. Show that f(g_ (x)) = x for all x in L 2 . In other words, g is also a partial 
inverse for f. 

b. Describe the action of g in terms of polar coordinate overlays on the 
source and target (cf. Exercise 4.13). 

c. Determine the image g- ( L 2 ). 


4.18. a. 


Show that g (x) can be extended to the two sides of the x-axis {y = 0) as 
follows: 


u = — y/x, 
v = 0, 


ifx > 0; 


u = 0, 

v=VWl 


ifx < 0. 


What, therefore, is the image of the x-axis under this extension of g_? 
b. Show that g + and g_ agree on the negative x-axis but disagree on the pos¬ 
itive x-axis. Excluding, therefore, the positive x-axis from the domain of 
g_, explain how g+ and g together define a single map on the whole 
plane R 2 that serves as an inverse for f. Show that this combined map is 
not continuous across the positive x-axis. 


4.19. a. Compute d(g_) x , where g_ is given in Exercise 4.17. 

b. Use the coordinate change x = u 2 — v 2 , y = 2m v (provided by the inverse 
map x = f(u)) to express d(g_) x in terms of u = (u,v), giving d(g_)f( u ) • 

c. Verify that d(g_ )f( U ) is the inverse of df u . 


4.20. The object of this exercise is to study the action of g_ (Exercise 4.17) in a 
microscope window centered at xq = (0, —9/4). 


a. Determine the center g (xo) of the target window. 

b. Explain why g_ maps the 270°-line in the window at xo to the 135°-line in 
the target window. 
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c. Show that d(g_ ) Xfl = ARg, for certain A > 0 and 0 < 0. What are the values 
of A and 0? Does g “look like” its derivative d(g_) Xo in this microscope 
window? Explain, in terms of what you know about the action of g. 

4.21. Let f: R 2 — > R 2 : (m,v) i— > (x.y) be the map defined by 



a. Describe the global behavior of f by showing what happens to an ordinary 
Cartesian grid in the source. In particular, indicate the effect of the equation 
y = v 2 . This map is sometimes called a fold', your picture should explain 
why. 

b. Determine the derivative df( a ^ at each point (u,v) = (a,b). 

c. Show that the local area multiplier of f at (a,b) is b. Hence the local area 
multiplier along the horizontal axis (the w-axis) is 0; why? What feature of 
the map f does this reflect? 

d. Sketch the effect of f in a microscope window centered at (u,v) = (3,2), 
and indicate how this corresponds to the effect of the derivative df( 32 ). 

e. Sketch the effect of f in a microscope window centered at (w,v) = (3,0), 
and indicate how this corresponds to the effect of the derivative df (3 0 ) • 
(Note: the local area multiplier here is 0, and the derivative df( 3 0 > is non- 
invertible. Thus we do not expect f to look like df (3 0 , in a microscope 
window centered at (3,0).) 

4.22. a. Obtain the derivative dq a of the map 


x = w 3 — 3 i/v 2 , 
y = 3 u 2 v— v 3 . 


b. Show that dq^*) is a similarity transformation (cf. p. 118), that is, a ro¬ 
tation by an angle 0 combined with a uniform dilation by a factor A. De¬ 
termine 0 and A in terms of a and b. Conclude that q is conformal on the 
whole plane minus the origin. Why must the origin be excluded? 

c. Use a polar coordinate overlay to create a description of the action of q that 
is analogous to the description of the quadratic map (as one that doubles 
angles and squares distances from the origin). 

4.23. Repeat all the steps of the previous exercise for the map 


x = m 4 — 6mV + v 4 , 
y = 4m 3 v — 4mv 3 . 


In particular, find A = 4(a 2 + 6 2 ) 3 / 2 and 0 = arctan 
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4.24. a. Show that the local area magnification factor at the parameter point (0, <p) 

on the unit sphere is cos (p. 

b. Show that the arc length of the “parallel of latitude” at latitude <po is 
2kcos<Pq. Show that the arc length of a “meridian of longitude” at lon¬ 
gitude 0o is n, independent of 0o- 

4.25. a. Compute the matrix of the derivative df^/ 3 ^/ 6 ) of the unit sphere map f, 

and describe its image in R 3 . 

b. Sketch the image of the unit sphere map f in a microscope window at the 
point (tc/3,7t/ 6). Show that the window can be made small enough so the 
image is indistinguishable from the image of the derivative df (ff /3 

4.26. Show that the local area magnification factor at the parameter point (p,q) on 
the crosscap is \Jp 2 +Aq 2 +4q 4 . Confirm that the local area magnification 
factors at the points a, b, and c discussed in the text are, respectively, 1,3, 
and 0. 

4.27. Show that the window equation 


Ax = f(p + Au) — f(p) = df p (Au) + Ri , p (Au) 

for the crosscap parametrization f at an arbitrary point p = (p,q) can be writ¬ 
ten with the remainder Ri p explicitly as 

Ax= dfp(Au) +Ri ;P (Au), 

Ax = A u + 0, 

Av = qAu+pAv+ AuAv, 

Ar = —2qAv + — (Av) 2 . 


Note that Ri, p (Au) is purely quadratic in Au and is independent of p. 

In exercises 4.28—4.30, the map t«, a : U 2 —► R 3 (where 0 < a < R) parametrizes 

a torus: 


{ x= (7? + acos(p)cos0, 
y= (7? + acos<p)sin0, 
z = asirup, 


0 <d<2n, 
—n < <p < n. 


4.28. Make a sketch of the entire image of t 31 . From this, describe what R and a 
measure on the torus. What happens to the shape of the torus if R < a? 

4.29. a. Compute the (matrix of the) derivative d(t/{ ia )( 0>( p) at an arbitrary point 

(0, <p) and for arbitrary R and a. 

b. Determine the local area magnification factor of t/?, a at an arbitrary point 
(0,<p), and confirm that it is independent of 0. For which value of <p is 
the factor largest, and for which is it smallest? Is this consistent with your 
sketch of the action of t 31 ? 
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4.30. a. Sketch the image of t 3 j in a microscope window at (0,<p) = ( n/A,n/2 >), 
and compare it to the image of the derivative of t^i at that point, 
b. Do the same at the point (0,<p) = (0 ,tt/2). 


4.31. Let x = g(v) > 0 be a smooth function for a<v<b. The surface parametrized 
as 


r: 


jc = g(v)cos0, 
y = g(v) sin0, 
z = v, 


0 < 0 < 2 n 1 
a <v< b, 


is a surface of revolution. The curve x = g(z) is called its generator. 


a. Determine the derivative drj e v ) at an arbitrary point, and determine the 
local area magnification factor. 

b. Confirm that the local area magnification factor is independent of 6, and is 
smallest where g(v) has its minimum. 

c. Show that when g(v) =2 + sin&v, and k is properly chosen, the largest local 
area magnification factor does not occur where g(v) has its maximum. 

4.32. Suppose y = f(x) is differentiable at x = a; show that d f, (Ax) = f(a)- Ax. 

4.33. Show that a linear map is differentiable everywhere, and is its own derivative: 

if L : —*■ is linear then dL a = L, for every a in R p . 

4.34. Let f: U 2 —> M 2 : (p,<p) i— > (w, v) be the polar coordinate map (p. 134), and let 


. jp = \/u 2 + V 2 , 

1 (p = arctan (y/x), 

be its inverse. Show that their derivatives are inverses; that is, show that 

d( rl )f(p,<p) = (d f (p,<p))^» 

Note that, for the equality to hold, the two sides must be expressed in terms of 
the same variables; thus, the derivative df^ 1 ^ must be determined at the point 
(w,v) = (p,<p) = (p cos<p,psin<p). 

4.35. Determine the Jacobians d(x,y)/d(u,v) and d(u, v)/d(x,y) when 

x = m 3 — 3 i/v 2 , 
y = 3 « 2 v — v 3 . 


4.36. Let (p(t) = 0(x(t),y(t)), where & is differentiable andx =x(t) and y=y(t) 
be differentiable functions of t. 

a. Verify that (p'(t) = & x (x(t),y(t))x? (t) + G> v (x(t),y(t))y'(t), or, more briefly, 
(p' = gradd -(x',/). 
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b. Suppose O is a potential for F: F(x) = grad0(x) (cf. p. 25), and C is an 
oriented curve. Fill in the details to show that 


[ F-dx= [ d(p = 0(x) 
JC Ja 


end of C 

start of C 


4.37. Suppose f: U 2 —> R 2 : (w,v) i—> (x(u,v),y(u, v)) is a continuously differen¬ 
tiable map, not necesarily invertible. Let C be a piecewise-smooth oriented 
curve in U 2 for which f(C) is also piecewise smooth and oriented (cf. p. 9). 
Assume C and f(C) have a common decomposition into smooth oriented 
curves: 

C = Cx + ••• + <?„„ f(C) = f(Ci) + • • • + f (C m ); 

each Ci and f(C/) is either a simple closed curve or a simple curve (i.e., no 
self-intersections). If u = u;(f), a, < t < k, is a continuously differentiable 
parametrization of Q, i = 1 then x = f(u ; (f)) is a continuously differ¬ 

entiable parametrization of f(C,). 


a. Let P(x,y) and Q(x,y) be continuously differentiable functions defined on 
f((7 2 ); show that 


J f(Q) 


Pdx + Qdy 


[ (. P*x u + Q*y u )du + ( P*x v + Q*y v ) dv. 
JQ 


Here, P* = P*(u,v) = P(x(u,v),y(u,x)), x u = dx/du, and so forth. The 
equation describes how the path integral on the left is transformed into the 
one on the right by the change of variables (x,y) = f(u,v). 
b. Deduce that 


[ Pdx + Qdy= ( (. P*x u + Q*y u ) du + ( P*x v + Q*y v ) dv. 
Jf(C) JC 


4.38. Let f: (r, 9) h-> (x,y) be the polar coordinate map, and let C be any continu¬ 
ously differentiable oriented curve in the (. r , 0)-plane with r > 0. Determine 
how the path integral 


1 = 


L 


f (c) x 2 +y 2 


dx + —- dy 


x 2 +y 2 

is transformed by polar coordinates. Use the transformed integral to show that 

end of f(C) 


I = A0 = 0 


start of f(C) 


4.39. Let f: (x,y) = (m 2 — v 2 ,2uv) be the quadratic map, and let C be any contin¬ 
uously differentiable oriented curve in the (u, v)-plane that avoids the origin. 
Show that 
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/ . ~ dx+-z -~c/v = 2 / ■ 

Jf(C) x A +y z x- +y- Jc 


—v 


■ du - 


f ~ =L ~ 

Ji(C) X 2 +y 2 


dx + 


~ X 1 dy= 2 arctan ( - ) 
x 2 +j 2 \u / 


end of C 

start of C 


dv 









Chapter 5 

Inverses 


Abstract Inverses help us solve equations: if 5 = x 3 , then x = v^5. Equations also 
imply relations between their variables. For example, if x 2 -l-y 2 — 1=0, then we 
can “solve fory” to get either y = +y/l —x 2 ory = — \/\ — x 2 . We soon learn that a 
formula for an inverse or for an implicitly defined function is seldom available. Usu¬ 
ally, the most we can expect to know is that such a function exists. As we show, even 
this apparently limited knowledge can simplify and clarify our view of a problem, 
the same way that changing coordinates can simplify an integration. In this chapter, 
we look only briefly at explicit formulas. We give the bulk of our attention to the 
way inverses give us a powerful tool for understanding maps, and to the conditions 
that guarantee their existence. The next chapter does the same for implicitly defined 
functions. 


5.1 Solving equations 

The first inverse operations we learn are subtraction and division; after all, x=y/m 
is the inverse of v = mx. And division is the first place where we see that an inverse 
may not exist: “You cannot divide by zero” is the way we say that v = 0 x x has no 
inverse. We use subtraction and division to solve equations, at the start, just linear 
equations of the form y = mx + b. After this come polynomial equations and the 
square root x = yjy , introduced as the inverse of y = x 2 . The square root function 
shows us that an inverse may have a restricted domain of definition (y > 0 in this 
case) and a restricted range (we needx = — y/y along withx = +y/y). 

For each new function in calculus, an inverse is introduced with it; the exponen¬ 
tial and logarithm functions provide a good example. The immediate use of inverses 
is in solving equations, including even those that give alternate formulas for inverses 
themselves. For example, the hyperbolic cosine function y = coshx has an inverse 
that is written simply x = arccoshy (or x = cosh 1 y). But we can get a different — 
and possibly more useful—expression for the inverse by solving the defining equa¬ 
tion 
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Early examples 


Inverse of the 
hyperbolic cosine 
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Branches of the inverse 


e* + e~ x 

y = coshx =--- 


for x algebraically. Some simple computations give 

2ye* = e 2x + 1 and then e 2x — 2ye* +1=0. 

Notice that this is an ordinary quadratic equation in e*; the quadratic formula (an 
inverse!) gives 


2y± /4y 2 -4 , ri —7 

tr = -x-=j±vr-1- 


We can finally solve for x itself by using the logarithm (yet another inverse): 



The “±” in the formula for x means that the inverse splits into two parts, or 
branches, with a separate formula for each. The graph of x = arc cos hi-’ on the right, 
above, helps us see why. It is the reflection of the graph ofy = coshx across the line 
y = x. It splits into two halves at the point (x = 0) where / = 0: 

upper, x > 0 : x = In (y + \Jy 2 — l), 
lower, x < 0 : x = In (y — \Jy 2 — l). 

The two branches imply that we should think of the inverse as a 1-2 map: for each 
v > 1, the inverse gives two x-values. 

There is more to say here: the graphs of those two branches are symmetric across 
the y-axis, implying that the two corresponding x-values must be negatives of each 
other. In other words, the equation of the lower half should be 

x = - In (y + \/y 2 — l) . 

There is no conflict, however. Note that 


(y- \/y 2 -1) (y+Vy 2 - 1) =r - Cr - 1) = i 
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SO 

In(y- \Jy 2 - 1) =ln^—— = ~ln(y+y / y 2 -l). 

Finally, notice that the term y 2 — 1 under the radical implies the inverse is defined 

only for v > 1, a fact borne out by the graph. _ 

In a similar way, you can show that arcsinhy = In (y + \Jy 2 + l) (there is no “±” 
ambiguity here) and use this formula with the pullback substitution y = sinhx to 
show that 

/7ft7 = ln( - v+v ^ TT )- 

See the exercises for this and other questions involving the hyperbolic functions and 
their inverses. 

Inverses play a crucial role in solving problems even when there is no formula 
or explicit expression for the inverse in terms of elementary functions. For example, 
consider the differential equation 


dy = y 
dx y— 1 

This equation, as written, indicates that y changes with x, so x is the independent 
variable. Thus, we are looking for a function;; = /(x) for which the equation 


/(*) 


/(*) 

/(*) - 1 


is an identity in x, at least for all x in some interval. 




A solution is shown above, at the right. We can obtain this solution and others 
by using the method of separation of variables. The method begins by rewriting the 
original differential equation as 


dx = —- dy = 



dy 


Inverses of other 
hyperbolic functions 


Inverses without 
formulas 


Separating variables 
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Branches 
of the solution 


Initial-value problems 


Solving two equations 
in two unknowns 


(a differential equation was originally “an equation involving differentials”). Inte¬ 
grating this, we get an expression involving an arbitrary constant c, 

x = y— ln(y) + c, y > 0, 

whose graph is shown on the left, above. Of course, the function y = fix) we seek 
is the inverse, whose graph is shown on the right. 

We put a hole in the graph of x = y — In v + c at (x,y) = (1 + c, 1) because 
the original differential equation is undefined when y = 1. The inverse hence has 
two branches; we call them y = f c \ ( x ) and y = f c ^{x )• The branches have different 
ranges, 0 < f c y (x) < 1,1 < f Ci 2 (x), but the same domain, x > 1 + c. We can describe 
the two functions by their graphs or by the words “the two branches of the inverse 
of x = y — In y + c.” They have no formulas. 

Separate branches here are welcomed, because they provide the flexibility 
needed to solve different initial-value problems. For example, sketched below are 
the two particular solutions f a and fp to the differential equation that satisfy the 
different initial conditions 


fa( l) = i fp (0) = 2. 



These examples raise an obvious question: what is the solution if the initial value is 
not positive? We can extend our formula for x to y < 0 by 


x = y - ln(-y) + k, 

where k is a constant unconnected to c. The inverse y = (x) here is the branch 

we need; its range isy < 0. See the exercises. 


Our rather ad hoc way of solving equations can, with some luck, be carried over 
to several functions of several variables, for example, to produce formulas for the 
inverse of a map. Consider the quadratic map 

f fx = ir — v 2 , 

1 y = 2mv, 


from Chapter 4. The inverse of f expresses u and v in terms of x andy. We do this— 
that is, we solve for u and v—by isolating each of these variables in its own separate 
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equation. The key is to notice that 

2,2 4 o 2 2 . 4 i yi 2 2 12 , 2\2 

x +y = u —2 u v + v +4 u v = [u + v j . 

We are then able to isolate u and v by adding and then by subtracting the pair of 
equations 


u 2 + v 2 = \/ x 2 +y 2 , 
zr — v 2 = x; 


this gives us the components of f“ 1 : 


u = i 


\A 2 +y 2 +x 



(The expressions for u and v are real because i/x 2 +y 2 = m 2 + v 2 > 0.) 

The “±” signs put the image point in the four different quadrants of the ( u,v )- 
plane. To decide which signs to use, recall that the original map f : (m,v) i—> (x,y) 
“doubled angles.” In particular, it mapped the first quadrant of the (u. v)-plane to the 
upper half-planey > 0 and the second quadrant to the lower half-plane y < 0. Thus 
f 1 maps y > 0 to the first quadrant, 


Choosing signs 
forr 1 


„ = + ,/v5+Z±f, v = +./ vg+7-» | ifyi0: 


and y < 0 to the second quadrant, 


J yjx 2 +y 2 +x ^/x 2 +y 2 -x 

u = -\ -X-, v = +\ ---, ify < 0. 


These are the formulas for g + (Exercise 4.13) and g_ (Exercise 4.17) on pages 144ff. 
What happens on the overlap v = 0? If x < 0 and y = 0, then 


v = ' 


W-; 


X X —x-x 


= V'-x = i/R > o, 


Do the formulas agree 
on the overlap? 


and 


U = zh 





—x + x 
2 


= 0 . 


The two pairs of formulas agree: for both, the image of the negative x-axis is the 
positive v-axis. On the other hand, if x > 0 and y = 0, then |x| =x, so 



and u= ± 



±\/W- 


V = 




































156 


5 Inverses 


H 1 is discontinuous 


Give r 1 

a second branch 


Solving equations 
by finding fixed points 


Here there is a conflict: the first pair of formulas (where the sign of u is “+”) maps 
the positive x-axis to the positive w-axis. but the second pair maps it to the negative 
w-axis. 

One way to eliminate the conflict is to remove the positive x-axis from the domain 
of the second pair of formulas. Then is well defined on the whole plane R 2 . 
However, there is a cost: along the positive x-axis, f“ 1 is discontinuous. For example, 
as the points p and q, below, become arbitrarily close, their images do not. 




There is a radical way to solve this problem that builds on the fact f is a 2-1 map 
(because diametrically opposite points in the (w,v)-plane have the same image un¬ 
der f). This suggests that f 1 is, more properly, a 1-2 map and therefore has a second 
branch. (Consider, for a moment, the 1-dimensional analogue of f: x = f{u) = w 2 ; 
/ _1 has the two familiar branches u = +y/x and u = —y/x. Not coincidentally, the 
two M-values are opposites.) If f 1 already assigns to the point x the point u in the 
upper half-plane, then we can easily get the second branch by having f 1 assign to 
x also the diametrically opposite point —u in the lower half-plane: 

r 1 (x) = ±u. 


Here is a set of formulas that expresses both branches in terms of components. The 
two branches are distinguished from each other by the “±”signs: 


(m,v) = r 1 (x,y) 



T>0, 

y <0. 


The second branch eliminates the discontinuity along the positive x-axis. For ex¬ 
ample, f^q) in the figure above now becomes the pair of points ±f^*(q), and 
similarly f~*(p) branches into ±f^*(p). Then, although ±f _1 (q) is not close to 
±f _1 (p), it is close to =Ff^*(p). 

There is another important method we can use to solve equations: find the fixed 
points of a suitably chosen map by iterating the map. We show immediately below 
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how this gives us a valuable computational tool; in Chapter 5.3, we show that it 
provides the theoretical key to the proof of the inverse function theorem. 

Definition 5.1 Suppose g : X —» X is a map of a set X to itself; then x is a fixed 
point of g if g(x) =x. 

Suppose, now, we must solve the numerical equation y = /(x) for x when y is 
given. Let g(x) = f(x)—y + x; then the following chain of implications shows that 
every fixed point of g is a solution of f(x) = y, and conversely: 

g(x)=x 4=> f(x) -y+x = x f(x) =y. 


The formula g(x) = f(x) —y+x is just one way to construct a function whose fixed 
points are the solutions to y = f{x); there are others. One familiar example is pro¬ 
vided by the Newton-Raphson method for finding roots of /(x) =0: 


g(x) 


/ 0 ) 

/'«' 


Another is provided by the ancient Babylonian algorithm for finding square roots. 
We look at this in detail in order to see how well the fixed-point approach lends 
itself to computation. Given a> 0, our goal is to find 3c > 0 so that x 2 = a. We have 


x = a/x so 2 x = x + a/x and x = 


x+a/x 

2 


In other words, x = fa is a fixed point of 

. . x + a/x 

g W = — 2 -- 

But g is just the function; the algorithm itself tells us howto find x: pickxo arbitrarily 
(but reasonably close to fa), and then set 


The Babylonian 
algorithm 


*i=g(*o), x 2 =g{xi), x 3 =g(x 2 ), 


and so on. The sequence xo, x\, x 2 , ... converges to the fixed point x = fa. An 
example makes it clear how rapid this convergence can be. Take a = 6 and let xq = 2. 


Then 

n 


x„ 


x 


2 

n 


1 

2 

3 

4 

5 


2.5 

2.45 

2.449489795918367 

2.449489742783179 

2.449489742783178 


6.25 

6.0025 

6.000000260308205 

6.000000000000004 

5.999999999999999 


To fifteen decimal places, x = 2.449489742783 178. The convergence here is espe¬ 
cially rapid: the number of correct digits roughly doubles with each iteration. 
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Fixed points 
by iteration 


Contraction mappings 


Example 1: 

f(x) = y/x at x = 1 


The Babylonian algorithm suggests the following general procedure for finding 
a fixed point. Take a point xo and construct its iterates under g: x n+ \ = g(x„), n = 
0,1,2, If the sequence has a limit, let x be that limit. Then 

g(x) =g( limx„) = lim g(x„) = limx„ + i =x 

if g is continuous. (Continuity is needed to be sure that the limit can be taken either 
before or after g is evaluated.) Thus x is a fixed point of g. The Newton-Raphson 
method is implemented by the same kind of iteration. 

Of course, in order to use this procedure, we have to make certain that the iterates 
have a limit, and the map g is continuous. For a contraction mapping (Definition 5.3, 
p. 167), these conditions are satisfied, and the contraction mapping principle (The¬ 
orem 5.1, p. 167) then guarantees the existence of a unique fixed point. 


5.2 Coordinate changes 

We already use coordinate changes in integration, to simplify an integrand or to 
convert it into a more recognizable form. In this section we put coordinate changes 
to larger use, to simplify the geometry of a map. For instance, we saw in Chapter 4 
that a map frequently “looked like” its derivative near a point. The derivative, being 
linear, was essentially simple; the resemblance between the map and its derivative 
meant that the map itself was simple, too, at least near that point. Our goal in this 
section is to explain what it means for one map to look like a second; in fact, it 
means that, when the first map is expressed using appropriate new coordinates, it 
will be identical to the second one. To see how coordinate changes play this vital 
role, we consider several examples. 

At the point x = 1, the tangent to the graph of y = f(x) = y/x, is a straight line 
of slope 1/2. Let us analyze / in a window centered at (x,y) = (1,1), first using 
coordinates Ax=x — 1, Ay = y — 1 based at the center of the window. Then 


Ay = y — 1 = y/x — 1 = y/\ + Ax — 1, 
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so the formula for / in the window coordinates is Ay = — 1 + \J 1 + Ax. The graph 
is the familar one shown in black on the right, above. With it, in gray, is the graph 
of the derivative, Ay = d f\ (Ax) = j Ax. The black and gray graphs “share ink” near 
Ax = 0, ample evidence that the square root map “looks like” its derivative there. 
But we can do even more: with the proper coordinate change Ax = (p(As). we can 
make the formula for / become Ay = j As. In the new (As, Ay) window, the graph 
of / will be straight. 

How can we find (p ? Because our goal is to simplify the formula for /, and 
because that formula involves \/l + Ax, a reasonable approach is to make 1 + Ax a 
perfect square. Thus, let 


1 T- Ax — 1 T- As ~T 


( A ?) 2 

4 



Then 

(As) 2 

Ax = As + —-— = (p (As) 
is a pullback substitution that does what we want: 


f : Ay — — 1 + Vl + Ax — — 1 + yf 1 + <p (As) 


= -l + 


1 + As + 


(As) 2 

4 


= -l + 



2 As. 


Thus the formula for / in the (As, Av) window is identical to the formula for d/i in 
the (Ax, Ay) window. 

Let us extend our pullback to a map (p : (As, Ay) i—> (Ax, Ay) of one window to 
the other: 

I Ax = <p(As), 


Ay = Ay. 


We see the effect of (p in the figure above, on the left. For a start, <p pulls back 
the uniform grid to the nonuniform one shown. The numbers at the bottom of the 
vertical grid lines are the Ax-values in both cases. Pick a vertical line with the same 
Ax value in each of the windows; you should check that, at a point where the black 
graphs cross those lines, the Ay coordinates agree. This means that the black line 
in the (As, Ay) window is the graph of the same function—namely /—as the black 
curved line in the (Ax, Ay) window. 

The pullback gradually stretches the grid on the left and compresses it on the 
right. This is just a geometric manifestation of the nonlinearity of the map (p. Near 
the origin, there is virtually no distortion in the grid. In other words, the “coordinate 
change” does not change anything there. (This is a consequence of <p'( 0) = 1.) The 
nonlinearity of <p makes it possible to straighten the curved graph of /. Of course, 
the same nonlinearity causes the straight tangent line to bend into a curve, in this 
case the parabolic curve 


A pullback to simplify 
the formula for f 


The pullback map 


Nonlinearity of 
the pullback 
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Changing y 
instead of x 


Example 2: 
semi-log paper 



In terms of its own coordinate, the (As, Ay) window covers the horizontal range 
—2 < As < 2(— 1 + \/2) ~ 0.8. The points As = —2, — 1, and 0.8 are marked on the 
A.v-axis; compare them to nearby values of Ax. 

We just converted / into its derivative by changing the source variable x. We can 
accomplish the same thing in a different way by making an appropriate change in 
the target variable v. In the exercises you are asked to find an explicit formula for a 
push-forward substitution A w = Xj/(Ay) that converts 


/: Ay = — 1 + vT+Ax 


into Aw = 7 Ax. The figure below shows the form that the coordinate change takes: 
to straighten the graph off, the bottom of the (Ax, Ay) window must be compressed 
(quite severely near A_y = — 1), and the top stretched. 




The two maps <p and iff suggest a general principle: to convert a curved graph 
to a straight one, plot it on a suitable nonuniform grid. Perhaps the most familiar 
example of this is semi-log graph paper, on which an exponential function plots as 
a straight line. We take this now as our second example of a coordinate change that 
simplifies the geometry of functions. 

To be concrete, consider the function g(x) = 3 x 10°' lx . We use base 10 here be¬ 
cause the usual semi-log paper is geared to it (rather than to base e, for example). 
On the left, below, is the graph of g; it has, of course, the familar shape of an ex¬ 
ponential curve. The coordinate change Y = log 10 y (a push-forward substitution) 
gives 

y = logi 0 y = l°gio(3 x 10 0 bc ) =0.1x + log 10 3, 

making Y a linear function of x. Its graph is the straight line shown in black, on the 
right. For comparison, the graph ofy = 10* is shown in gray. It is also a straight line, 
with a slope 10 times steeper than the black graph. 
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You can verify that the semi-log map, 


si: 


x = x, 

Y = log 10 y, 


The semi-log map 
and exponentials 



y 

100 

50 


10 

5 


1 


Y = (A'log 10 a)x + log 10 fi; 


see the exercises. It compresses the image of a uniform grid more and more in the 
vertical direction. In particular, notice that the vertical spacing is Ay = 1 on the lower 
half of the image grid but Ay = 10 on the upper. (Although the grid immediately 
below Y = 0 is not shown, you will find it repeats the same nonuniform pattern 
but with a spacing of Ay = 0.1.) Semi-log paper has two virtues that are commonly 
exploited. First, it allows data values that vary over several orders of magnitude to 
be plotted in a small space. Second, it makes exponential growth or decline easier 
to discern and to quantify, by plotting it on a straight line. You can explore these 
features, and the related log-log map, in the exercises. 

For our third example we move to a 2-dimensional source and target, and to the 
quadratic map (Chapter 4.2) 


x = u 2 — v 2 , 
y = 2 mv, 



f2a —2b\ f Aw\ 

^26 2 a J yAv J ' 


Let us see how a suitable coordinate change near an arbitrary point (u,v) = ( a,b ) 
can convert f into its derivative df (o b y As usual, we set 

A u = u—a, Ax = x— (a 2 — b 2 ), 

A v = v—b, Ay=y—2ab, 


Example 3: the 
quadratic map 


to get coordinates (Au,Av) in a window centered at («,v) = (a,b) and coordinates 
(Ax, Ay) in a window centered at the image point (x,y) = f(a,b) = ( a 2 — b 2 ,2ab). 






















































162 


5 Inverses 


Solving for As and At 


In window coordinates, we represent the map f by the window map Af (see below, 
p. 172) defined by 

(Ax, Ay) = Af(Aw, Av) = f (a + A u,b + Av) — f (a, b). 

The formula for Af is therefore 

Ax = (a + Au) 2 — (b + Av) 2 — ( a 2 — b 2 ) Ay = 2(a + Au)(b + Av) — 2 ab 
= 2a Au — 2bAv + (Aw) 2 — (Av) 2 , = 2b Au + 2aAv + 2 AuAv. 


Our goal is to change coordinates in the source window, 


h : 


As = h(Au,Av), 
At = k(Au, Av), 


so that, in terms of the new coordinates, the formula for Af becomes the formula 
for df la h y That is, Af expresses Ax and Ay as the linear functions 


Af: 


Ax = 2a As — 2b At, 
Ay = 2bAs + 2aAt. 


Note that we now have two expressions for Ax (and, likewise, for Ay), one in¬ 
volving Aw and Av, the other As and At. To find the functions h and k that connect 
As and At with Aw and Av, we can begin by equating those expressions (in matrix 
form): 

f2a —2b\ ( A s\ / 2 a —2b\ ( Aw\ ( (Au) 2 — (Av) 2 \ 

y2Z> 2 a J yAt J y2b 2 a J yAv J + y 2AwAv J 

Then, to solve for As and At, we need only multiply by the appropriate inverse 
matrix: 

( A s\ _ / Aw\ 1 ( a b\( (Au) 2 — (Av) 2 \ 

yAty yAvJ 2(a 2 + b 2 ) \—b a) \ 2AwAv J' 

This is the coordinate change we seek; in effect, h = (df a ) -1 o Af. The individual 
components of h are 


h: 


As = h(Au,Av) = Au + 
At = k(Au,Av) = Av + 


a(Au) 2 + 2hAwAv— o(Av) 2 
2 (a 2 + b 2 ) 

—b(Au) 2 + 2wAwAv + b(Av) 2 
2 (a 2 + b 2 ) 


Incidentally, it is not yet evident that h is a coordinate change: that is, that the map 
h(Aw,Av) has an inverse defined in some neighborhood W of (As, At) = (0,0). In 
fact, there is such a local inverse, but rather than go through a proof in this particular 
case, we simply appeal to the inverse function theorem, proven later in this chapter. 
(In particular, see Corollary 5.4, page 176. It says h will have a local inverse at 
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(As, At) = (0,0) if its derivative is continuous near (0,0) and invertible at (0,0). All 
these conditions are satisfied; in particular, dh ( 0 .o) = I, the identity map.) 

By using the vectors 

Au=(Aw,Av), As = (As, At), Ax = (Ax, A y), a = (a,b), 

we can write the formula that connects the maps f, h, and df a as 

f(a + Au) - f(a) = Af(Au) = Ax = df a (h(Au)) = df a (As). 

Think of the formula this way. Each point p in the window centered at a has two 
different coordinate labels, Au and As. The map h connects those labels. The image 
of p under the action of f (i.e., Af) has coordinate Axi = Af(Au). The image of p 
under the action of df a has coordinate Ax 2 = df a (As). But Axi = AX 2 ; these are the 
coordinates of the same point q. Thus, f (written as Af in the window W) and df a 
both map p to q. That is why f “looks like” df a ; they are just different coordinate 
descriptions of the same map. All of this is diagrammed on the left, below, and 
summarized more briefly on the right. 

As 

V v* 

Au i-*- Ax 

Af 

Thus we have Af = df a o h. If we think of composition of maps as a kind of prod¬ 
uct, then we can say Af factors into h and df a . In effect, we constructed the coordi¬ 
nate change map h so that, in a small window centered at a, Af factors through h. 

We can get a better idea how the coordinate change h converts Af (or f) into 
d W) by focusing on a specific point. In the figure below, we have taken ( a,b ) = 
(V3/2, 1 / 2 ) and used windows that measure 1 unit on a side. Thus, the square grid 
in the (As, A/)-window at the top has a spacing of 0.1 unit. The same grid appears 
in the lower-left window, “pulled back” by h; it becomes curved there because h is 
nonlinear. The lower-left window therefore demonstrates concretely what we said 
above: that each point in the source has two sets of coordinates. The curved grid 
provides (As, At) whereas the “native” coordinates (whose square grid is not drawn) 
are (Ah,Av). Thus, for example, 



(—0.3,—0.1) = (As, At) <-> p <-> (Ah, Av) ss (-0.3754,-0.0996). 

The curved grid is the key to visualizing both the action of f (as Af) and the 
connection between f and its derivative. First, follow f as it maps the source on the 
left directly to the target on the right; it sends the curved grid to the grid of large 
squares, straightening all grid lines in the process. Second, follow h and df (v ^ 2 1 / 2 ) 
into and out of the upper window. This time h itself straightens the curved grid, 
mapping it to the “native” grid in the (As, At (-window. The linear map df ( ^ 2 1 / 2 ) 


Why f “looks like” df a 


f factors through h 


Converting Af to df 

at (73/2,1/2) 


Mapping the 
curved grid 
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Details of the 
(As, At) coordinates 


then carries the (As, At) grid to the grid of large squares in the target. Now we 
already know that 

df(\/3/2,l/2) = 2 ^jt/6 

(rotation by n/6 radians with all lengths doubled; see p. 118), so the large squares 
in the (Ax, Ay)-window are 0.2 units on a side and make an angle of 30° with the 
horizontal. 




The figure has even more to say about the map h that pulls back the (As, At) 
coordinates to the (Au,Av) window. The curves that make up the new grid are, of 
course, just contour lines of the two functions 

As = h(Au,Av), At = k(Au,Av) 

defined in the window. Contours of h give the roughly vertical curves; contours of k 
give the roughly horizontal ones. The following Mathematica 5 code generates the 
curved grid in the (Ah, Av)-window. 

scon = ContourPlot[u + Sqrt[3](u"2 - v"2)/4 + u v/2, 

{u, -.5, .5}, {v, -.5, .5}, Contours -> 

{-.6, -.5, -.4, -.3, -.2, -.1, 0, .1, .2, .3, .4, .5, .6}, 

ContourShading -> False, FrameTicks -> None] 
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tcon = ContourPlot[v + Sqrt[3]u v/2 + (v"2 - u~2)/4 , 

{u, -.5, .5}, {v, -.5, .5}, Contours -> 

{-.6, -.5, -.4, -.3, -.2, -.1, 0, .1, .2, .3, .4, .5, .6}, 

ContourShading -> False, FrameTicks -> None] 

Show[scon, tcon] 

The two sets of curves are everywhere orthogonal. This is not automatic. It hap¬ 
pens because the map h is conformal (p. 118); see Exercise 5.19. Note furthermore 
how the axes are pulled back: the A. v-ax is is tangent to the Aw-axis, and the Af-axis 
to the Av-axis. Moreover, the grid squares around the origin undergo almost no dis¬ 
tortion: the pullbacks are nearly the same size and shape as the original. This is a 
consequence of dho = I. 

Because the coordinate lines As = constant and At = constant become curved 
when they appear in the (Am, Av)-window, we say that A.v and At are curvilinear 
coordinates there. As we have just seen, curvilinear coordinates can simplify our 
view of a map. This is a trade-off, of course: to simplify the map, we complicate 
the coordinates. But this is a cost we have already accepted when, for example, we 
plot exponential functions on semi-log paper. We have also accepted it when we use 
polar coordinates: it was a curved polar coordinate grid that first clarified the action 
of the quadratic map f. Here is our earlier view (p. 117) of the local action of f in a 
window centered at (v/3/2,1 /2): 


f 


Compare this figure now with the new one (p. 164) that uses the curvilinear (As, At) 
coordinates in the (Am, Av)-window. 

Summary: Under a suitable coordinate change, a complicated situation can often 
be made simpler; for example, it may be possible to convert a map locally into its 
derivative. 




5.3 The inverse function theorem 

As we have seen, coordinate changes give us a powerful tool to simplify the descrip¬ 
tion of a map. But a coordinate change must be invertible, a condition that is often 
difficult to verify directly. In this section we state and prove the inverse function 


Curvilinear coordinates 
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theorem. Simply put, the theorem says that if the derivative of a map is continuous 
and invertible at a point, the map itself is invertible locally near that point. The proof 
uses several tools, beginning with the contraction mapping principle, which can be 
nicely illustrated by the model village found in Bourton-on-the-Water. 



The nested models of 
Bourton-on-the-Water 


A more formal view 


Bourton-on-the-Water is in the English Cotswold hills, near Oxford. Filling the 
back garden of one of its houses is a scale model of the whole village. Now every 
point in the model village corresponds to a point in the actual village, so some 
point in the model must correspond exactly to itself. Which one? Because the model 
contains a copy of everything in the village, you would expect it to contain a copy 
of the model itself. It does; you can see it in the foreground of the photo above. The 
model of the model is small, of course; it covers only a few square meters. And that 
smaller model likewise contains a copy of everything, so it has a still smaller copy of 
itself. In theory, the nested copies could go on forever, getting smaller and smaller 
and converging, ultimately, to a single point. However, the third iteration was the 
last that was practical to build. Now return to our question about the point in the 
model that corresponds to itself. It is in the first model, by definition, but it must 
be in the second model, too, and the third, and so on. The point that corresponds to 
itself—the point that is left fixed by the model—must therefore be the limit point of 
the nested sequence of models. 

A little more formally, the model defines a map m : V —> V of the village, V, to 
itself, and the point in the model that corresponds to itself is the fixed point of that 
map (Definition 5.1, p. 157). The model is built at the scale a = 1/9. Thus, if x and 
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y are any points in V, and m(x) and m(y) are their copies in the model, then 

ll m (x) — m(y)|| = c ||x — y||. 

Now x appears in the model of the model at the point 

m(m(x)) = (mo m)(x) = m 2 (x); 

it appears in the model of the model of the model at m (x), and so forth. For the Ath 
iterate of the model, the scale factor would be a k \ 

||m*(x) m*(y)|| = <7 /l ||x —y||. 

Because a < 1, o k - 0 as k - °o; the size of the ktY i iterate shrinks to zero. Intu¬ 
itively, this forces the nested models to converge to a single point, say p. Now p is 
in every iterate, so it must be the fixed point of the model: m(p) = p. 

Although the contraction mapping principle can be stated quite generally, we 
need only the special circumstance of maps defined on a ball in R”. 

Definition 5.2 The ball B, of radius r centered at the origin in R” is the set of all 
points x in W for which ||x|| < r. 

Definition 5.3 A contraction mapping on B, is a map m : B r —> B, for which 

II m (x) — m(y)|| < <r ||x — y|| 
for some <J < 1 and for all x, y in B r . 

A contraction mapping is thus somewhat more general than a “scale model” map, 
where all distances are contracted by exactly the same factor. Here the factor can 
vary, as long as it is bounded by a fixed a < 1. The additional generality does not 
weaken the contraction mapping principle, nor does it make the proof more difficult. 

Theorem 5.1 (Contraction mapping principle). A contraction mapping m on B, 
has a unique fixed point x in B r . Moreover, for any x in B r , 

x = lim m /l (x). 

Proof. Pick xq arbitrarily in B r , and let x* = m /f (xo). For the “telescoping” sum 


X/t X k+i — X* X k+ 1 T X k+ 1 X£_|_2 "F * * * -f“ X£_|_/_ i x k+i , 


we have 

11 X/t - x.k+l [ | < 11X* - x* + ! 11 + 11 X * + 1 - X k+2 11 + • • • + 11 X* + /_ 1 - X k+ l 11. 
Now, for any i > 0, 

||x t+I --x t+i+1 || = ||m A+ '(x 0 ) — m A+, (xi)|| < o k+l ||x 0 -xi||; 


Contraction mapping 
principle 
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therefore the previous inequality implies 

< ff i '|!x 0 -x 1 || +a k+ 

< cr*||x 0 — xi|| (l + o + o 2 -\— 


X* — Xjt + /|| < cr* ||x 0 — xi|| + o k+x 11x 0 — x 1 11 H-b o k+l 1 ||x 0 -xi| 


Because cr 


Let x, n be the yth coordinate of x*; then 


= cr' 11 x 0 — x i 
0 as k 


1 - o' 


1 - o 

, the last inequality implies that 


lim ||x* — x*_|_/|| = 0 for any integer / > 0. 


I 2 

rk x 


/ I (i) (i) I 2 

k+t\ — \ x k x k+l I 


(«) 


I I vv (») I 2 
■ + \ x k ~ x k+l - 


= ||x*-X*+/|| 2 . 


Thus, for each fixed l> 0 and for every j = 1,2 


lim 


JJ) JJ) I _ 


x k+l I 


= 0 . 


Lemma 5.1. Ify\ is a sequence of real numbers for which \yk~yk+i\ —> 0 ask^* 
(for any positive integer l), then y^ has a limiting value, y, as k —> 

Proof. See an analysis text for a proof of this basic fact (“Every Cauchy sequence 
of real numbers has a limit”). □ 

The lemma permits us to define 

x U > = Mmxp, j=l,2,...,n, and then x = (x^, ... ,3c^). 

In other words, x = lim x*. 

Lemma 5.2. The contraction map m : B, —> B, is continuous at every point of B r . 
Proof. We must show m(x„) —> m(x) when \ n — > x. But we have 

||m(x„) — m(x)|| <ct||x„-x||->-0. 

(In fact, the inequality implies that m is uniformly continuous, even if o > 1; see an 
analysis text.) □ 

Because m is continuous (i.e., it commutes with limit processes), 


m(x) = m 



lim m(xi) = lim x^+i = x, 


so x is a fixed point of m. Here is a different argument, which does not depend 
explicitly on the continuity of m. It begins with the “telescoping” sum 
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x - m(x) = x - X A . +1 + m(x A ) - m(x), 


which implies 

||x — m(x)|| < ||x-x t+ i|| + ||m(x t )-m(x)|| < ||x-x* +1 || + <t||x*-x||, 

an inequality that is true for all k. But because x A —> x as k —> °°, the right-hand side 
vanishes as k—> leaving ||x— m(x)|| = 0 , or x = m(x). 

It remains to verify that the fixed point is unique. If y is also a fixed point, then 

||x-y|| = || m (x)-m(x)||<c7||x-y||. 

If || x — y || f 0, we can divide both sides of this inequality and get 1 < cr, contradict¬ 
ing our assumption that cr < 1. This forces x = y. □ 

Theorem 5.2 (Inverse function theorem). Suppose f : U n —> R" is continuously 
differentiable on U n , and its derivative is invertible at the point a in U r . Then f 
itself is invertible on the image df a (i?) in the target of some ball B ofpositive radius 
centered at the point a. The inverse is continuously differentiable on its domain, and 
d(f^*) q = (dff-i (q))~ 1 for all q in df a (5). 

Proof. Our proof expands an argument found in Lang [10, 11].. The proof is long; 
therefore we split it into a number of steps. 


U n R" 



The inverse is a local object, that is, one defined essentially in some window 
centered at the image point f(a). Therefore, we begin by setting up windows and 
introducing window coordinates. Let W a be a window centered at a in the source, 
with coordinate Au = u — a. Similarly, let WfU) be a window in the target centered 
at f(a); its coordinate is Ax = x — f(a). Finding!' 1 means solving the equation 

f(a + Au) - f(a) = Ax 

for Au in W a , given Ax in some suitable region in ILf,-,,,. In fact, the Au we seek is a 
fixed point of the map 

g(Au) = Au+ (df a ) _1 [Ax- (f(a +Au) — f(a))]. 


Step 1 
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Step 2 


f/ ^df a )-> 

W a — W a 
(P 


Step 3 


Step 4 


Step 5 


The bulk of the proof of invertibility involves showing that g is a contraction map¬ 
ping on a suitable ball B r , implying that Au exists. The map (df a ) 1 is needed here 
to bring the point Ax — (f(a + Au) — f(a)) from W { ( a ) back to W a , where it can be 
added to Au. 

Our analysis of g begins with the portion 

<p(Au) = (dfa )" 1 (f(a +Au) — f(a)), 
which is a map (p : W a —> W a . By the chain rule, 

dfl) A v = (dfa) O dfa-f-Av 

for every Av in W a . By hypothesis, df a + A v depends continuously on Av; therefore, 
d<p Av depends continuously on Av, as well. Furthermore, d<po = (df a ) _1 o df a = /. 
Define h : W a —> W a by 


h(Au) = Au — q>(Au); 

then dh Av = I - d<p Av is likewise continuous as a function of Av. It follows that the 
real-valued function 

jV(Av) = max || dh Av (Au) || 

11 Au || = 1 

is a continuous function of Av, as well. Because dh 0 = 0, the zero linear map, we 
have N(0) = 0, and continuity implies that 7V(Av) will be small when Av is small. 
Specifically, choose r > 0 so that 

11Av 11 < 2 r ==> |jV(Av)| < i. 

The value of r now set in Step 3 allows us to specify both the Ax-values that we 
allow (for the domain of f“ 1 ) and the domain of g itself. First, we require that Ax be 
in the image df a ( 5 ,./ 2 ) 0 S ,-/2 is the ball of radius r/2 at the center of W a ); this means 

II (dfa)" 1 (Ax) || <r/2. 

Second, we require that the domain of g be restricted to B r \ this means 

||Au|| < r. 

The next three steps show that g maps B r to itself and is a contraction mapping there. 

Because g = h 4 - (df a )~ 1 (Ax), that is, g and h just “differ by a constant,” most of 
what we still need to prove about g can be done by working with h. 

Lemma 5.3. f/j|Av|| < 2 r, then 11dh Av (Aw)11 < ^HAwHybr a//Aw. 

Proof. We can assume Aw is nonzero; then Au = Aw/||Aw|| is a unit vector and 
Aw = 11Aw11 Au. Because dh Av is linear, we have 

11dh Av (Aw)11 = 11dh Av (11Aw11Au)11 = ||Aw|| ||dh Av (Au)|| < ||Aw||N(Av) < jHAwlj. 
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We have used the fact that ||dhA V (Au)|| < 7V(Av) for any Av and for all unit vectors 
Au, by the definition of N (Step 2). □ 

Suppose Aui and Au 2 are in the ball of radius r, then the entire line segment 
Av = Aui +t(Au 2 — Aui) (0 <t< 1) is likewise, and we have 

l|dhAu 1 +/(Au 2 -Au,)(Aw)|| < j||Aw|| 

for all Aw. This inequality and the continuous differentiability of h (Step 2) allow 
us to use the mean value theorem for maps (Theorem 4.15, p. 140) to conclude 

||h(Au 2 ) — h(Aui)|| < 5 11Au 2 — Au 1 11. 

Moreover, if we set Aui = 0, then |jh(Au 2 )|| 11Au 2 11 < r/2. In other words, h 

maps the ball of radius r into the ball of radius r/ 2 . 

We now move on to g itself. 

Lemma 5.4. For any Ax in df a ( B r n ). g : B r —■ B r . 

Proof. Because Au is in B r , by hypothesis, we have ||h(Au|| < r/2 (Step 6 ). Also 
by hypothesis, ||(df a )“ 1 (Ax)|| < r/2, so 

l|g(Au)|| = ||h(Au) + (df a )' 1 (Ax)|| < ||h(Au)|| + ||(df a )“ 1 (Ax)|| < r. □ 

Lemma 5.5. For any Ax in dX-fB,.^), g is a contraction mapping on B r . 

Proof. Because g(Au) = h(Au) + (df a )~ ! (Ax), it follows that 

g(Au 2 ) - g(Aui) = h(Au 2 ) - h(Am); 


therefore, by Step 6, 

||g(Au 2 ) — g(Aui)|| = ||h(Au 2 )-h(Aui)|| < j||Au 2 - Aui||. □ 

Let Ax be an arbitrary point in df a (fi r / 2 )- This choice determines a specific map 
g : B r — > B r , and g has a unique fixed point Au in B r , by the contraction mapping 
principle. Because Ax determines Au uniquely, and 

f(a + Au) - f(a) = Ax, 

we now have the required inverse map f 1 : df a (B r / 2 ) —> B r : Ax i—> Au. 

Before showing that f -1 is continuously differentiable, we pause to call attention 
to the relation between a map and the way we write it within a window. For example, 
in W a and W^, the equation x = f(u) has the form 


Step 6 


Step 7 


Step 8 


Step 9 


f(a) + Ax = x = f(u) = f(a +Au) = f(a) + f(a +Au) — f(a). 
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Window map 
and equation 


Step 10 


Step 11 


The underlined elements are equal (this is the “window equation”), and they 
define the window map Af for f: 

Ax = Af(Au) = f(a + Au) — f(a). 

Conversely, we can reconstruct the original formula x = f(u) from its window equa¬ 
tion Ax = Af(Au). Furthermore, by solving the window equation for Au, we obtain 
the window equation of the inverse u = f 1 (x) (at the point b = f(a)): 

Au = Ar'(Ax) =r 1 (b + Ax)-r 1 (b). 

In preparation for showing f 1 is differentiable, we first show that it is uniformly 
continuous, by working with the window map Af 1 . 

Lemma 5.6. There is a positive constant K such that, for any two points Axj, Ax 2 
and their corresponding images Aui, A 112 under Af -1 , 

|| Au 2 - Aui || < AT|| Ax 2 - Axi ||. 

Proof. Recall the definition of the map h (now written using the window map for f): 

h(Au) = Au — q>(Au) = Au — (df a ) _1 (Af(Au)) = Au — (df a ) _1 (Ax). 

If we now evaluate this equation at Aui and then at AU 2 , subtract the first from the 
second, and use the linearity of (df a ) _1 , we get 

AU 2 - Aui = h(Au 2 ) — h(Aui) + (df a ) -1 (Ax 2 — Ax t ). 


It follows that 

||Au 2 — Am || < ||h(Au 2 ) — h(Au 1 )|| + ||(df a ) _1 (Ax 2 — Ax 2 )|| 
< \ || Au 2 - Aui || + C|| Ax 2 - Axi ||. 


for some positive C (see Exercise 3.28, p. 104). The first term on the right side of 
the second inequality is a consequence of the contraction property of h (Step 6). A 
final subtraction gives 


5 11 Au 2 — An 1 11 < C||Ax 2 — Axj ||, 

implying we can take K = 2C. □ 

The lemma establishes that Af~* is uniformly continuous (see the comment in 
the proof of Lemma 5.2, above). 

Because we claim the derivative of f 1 at the point q will be (df p ) " 1 , where 
p = i ^ 1 (q) is a point in the ball B r , we must first show df p is invertible. 

Lemma 5.7. Suppose p = a + Av for some ||Av|| < r (i.e., p is in B r ); then df p is 
invertible. 
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Proof. From the definition 

h(Au) = Au+ (df a ) _1 (/(a +Au) — f(a)) 
in Step 2, it follows that 

dh Av = I- (df a ) _1 odf a+Av , or I = dh Av + (df a ) _1 odf p . 
Therefore, when the maps in the last equation are supplied with input Au, we get 
Au = dh Av (Au) + (df a ) -1 (dfp(Au)), 

and hence 


||Au|| < ||dliAv(Au) || + 11 (df a )~ 1 (dfp(Au))11 
< 2 1|Au|| +C||df p (Au)|| 

for some C > 0, exactly as in Lemma 5.6. A bit of algebra now gives 

2 ^l|Au|| < || dfp(Au) ||. 

This inequality implies df p (Au) f 0 when Au / 0. In other words, the kernel of df p 
contains only 0, so df p is invertible. □ 

We now show f 1 is differentiable at an arbitrary point q in the domain df a (B r / 2 ), 
and its derivative is (df p ) 1 there (p = f 1 (q)). We work in windows W q and W v with 
local coordinates Ay and Av, respectively, and with the window equations 

Av = Ar 1 (Ay) = r 1 (q + Ay)-r 1 (q), 

Ay = Af(Av) = f(p +Av) — n ! (p). 

To prove r 1 is differentiable at q, and has derivative (df p ) -1 there, it is necessary 
and sufficient to show R(Ay) = o(Ay), where 

Av = Ar 1 (Ay) = (dfp )- 1 (Ay) + R(Ay). 

To analyze R, apply df p to both sides of this equation, 

df p (Av) = Ay + df p (R(Ay)), 

and then solve for Ay = Af(Av) to get 

Af(Av) = dfp(Av) - df p (R(Ay)). 

o(Av) 


Step 12 
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Step 13 


The role of continuous 
differentiability 


/'( x) exists but is not 
continuous at x = 0 


This equation expresses the differentiability of f itself at p, so the last term must be 
o(Av), as indicated. Because Z(o(u)) = o(u) for any linear map L (cf. the proof of 
Lemma 4.1, p. 133), 


R(Ay) = (df p ) 1 (o(Av)) = o(Av). 

However, we need o(Ay) on the right, not just o( Av). The uniform continuity of 
Af -1 (Lemma 5.6) in this setting implies ||Av|| < 74T||Ay||. Therefore, as Ay —> 0, we 
also have Av 0 and 


11 R(Ay) || || R(Ay) || 

l|Ay|| - ' 11A v 11 

(Lemma 4.2, p. 133 makes essentially the same point.) This proves that R(Ay) = 
o( Ay) and thus that f 1 is differentiable at q. 

The last fact to prove is that the derivative d(f^*) q depends continuously on q. 
But d(f^*) q = (dff-ijq )) -1 = (dfp) -1 ; therefore we can use the following chain of 
argument. 

• The entries of the nxn matrix (df p ) -1 are polynomial functions of the entries of 
dfp, and hence depend continuously on them. 

• dfp depends continuously on p. 

• p = r 1 (q) depends continuously on q. 

This completes the proof of the inverse function theorem. □ 

Corollary 5.3 Suppose f: U” —» R" satisfies the conditions of the inverse function 
theorem at the point a in U n . Then the image f(U n ) contains a ball B of positive 
radius centered at the point f(a) in the target of f. 

Proof. The conditions of the inverse function theorem apply to the inverse map f 1 
at f(a). For B take the ball provided by the theorem. □ 

The inverse function theorem assumes that the derivative is continuous. This hy¬ 
pothesis is invoked, for example, at the point in the proof where the mean value 
theorem of maps is used (Steps 6 and 7) to show that g was a contraction mapping. 
But is continuity necessary? Our proof needs it, but does the theorem itself? Can we 
find a better proof that dispenses with that hypothesis? 

In fact, the hypothesis is indispensable: there are differentiable functions that 
have an invertible derivative at a point but are not themselves invertible on any open 
neighborhood of that point. One such example is 

2 

/(*) = f + T sin ^ if* 7^0, /(0) = 0. 

The function undergoes infinitely many oscillations near x = 0, but because the 
graph is squeezed between the parabolasy = ( x±x 2 )/2 , it follows that f(0 ) = 
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The derivative map d/o(Ay) = jAx is thus invertible. Now consider f(x) forx ^ 0; 
a direct computation gives 



At the points \/n —■» 0, f'(l/n) is alternately positive and negative. In particular, 
/'(l/2n) = (1 - n)/2, so 



Thus, although / is differentiable everywhere, that derivative is not continuous at 
the origin. 


y =/(*) 



x 


However, f is continuous away from the origin, so it must change sign at some 
point c„ between 1 /(« + 1) and 1 /n. That is, f(c„) = 0; in fact the c„ are alternately 
local maxima and minima of /. Because 1 /n is an infinite sequence that converges to 
0, the same must be true of the interlaced sequence c n . Near each local extremum c„, 
f is a 2-1 map. Inasmuch as c„ —► 0, there is no open interval around x = 0 on which 
/ is 1—1. In other words, the oscillations make f noninvertible near the origin. 

In our analysis of several examples of maps of the plane in Chapter 4.2, we 
found that a map usually “looked like” its derivative locally. When we returned 
to the quadratic map above (pp. 161-165), we actually converted the map into its 
derivative within a window by expressing the derivative in terms of appropriate 
curvilinear coordinates. The curvilinear coordinates were supplied by a map h for 
which 


Af = df a o h. 


We claimed that the new variables could indeed serve as coordinates; that is, we 
claimed, in effect, that h was invertible. Because df a was obviously invertible, we 
defined h as 


f(x) is not 1-1 
nearx = 0 


When will 
f “look like” df a ? 


h = (df a ) _1 oAf. 
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Comparison with 
Taylor’s theorem 


But this, in itself, does not prove h invertible. Now, however, we can settle the 
question: If f is continuously differentiable (and thus invertible in a neighborhood 
of a, by the inverse function theorem), then h is invertible and 

h 1 =Ar 1 odf a . 

This discussion leads to the following corollary of the inverse function theorem. 

Corollary 5.4 Iff: U n —> M" is continuously differentiable on an open neighbor¬ 
hood JJ n of a, and df a is invertible, then there is a coordinate change h : V —> S" 
on some possibly smaller neighborhood V " of a for which Af = df a oh. □ 

Before leaving the inverse function theorem, let us compare it with Taylor’s the¬ 
orem as a tool for understanding the geometric action of a map. Suppose we use 
Taylor’s theorem to expand the map f :{/"—> M" at a point a: 

Ax = Af(Au) = df a (Au) +0(2). 

This equation was the basis for our frequent observation, in Chapter 4, that the 
derivative df a approximates f in a sufficiently small window centered at a. The equa¬ 
tion gives only an approximation because the difference 0(2) between Af and df a 
is nonzero, in general. But the approximation is a good one in the sense that the 
difference vanishes like ||Au|| 2 as Au —> 0. 

By contrast, suppose df a is invertible. The inverse function theorem then says 
that new coordinates As = h(Au) can be found so that 

Ax = df a (As). 

In these circumstances, df a equals Af in a sufficiently small window (at least when 
Af is expressed in terms of the proper curvilinear coordinates); the remainder 0(2) 
is dispensed with. There are some minor technical differences, too. For Taylor’s the¬ 
orem, the components of f must have continuous second derivatives; for the inverse 
function theorem, only continuous first derivatives are needed. 

The Taylor approximation goes a long way toward clarifying the action of f; 
however, the inverse function theorem provides the ultimate simplification: it shows 
that f is essentially linear near a. Perhaps most significantly, the inverse function 
theorem gives us a new tool to analyze maps: curvilinear coordinates and, more 
generally, alternative coordinate systems. 


Exercises 


5.1. Show that arcsinhy = ln(y+ \/y 2 + l); use this, the pullback substitution 
v = sinhx, and other properties of hyperbolic functions to show 






Exercises 


177 


5.2. 


a. Show that arctanhy = | In 



b. Use the pullback y 


tanhx ( not partial fractions) to determine 


/ 


dy 

1 — y- 


5.3. 

5.4. 


Usey = sinhx to show 


/ 


dy 

( 1 + 7)372 


y 

V l +y 2 


Use suitable hyperbolic pullbacks to determine 


f dy h / dy 

J (y 2 - 1)3/2 ana J (i_^)3/2- 


5.5. a. Sketch the graphs of y = sechx = 1/coshx and its inversex = arcsechv. 


b. Show that arcsech v = In 

c. Use both the graph and the formula for x = arcsechy to explain why its 
domain is 0 <y< 1. 

d. Show that the two halves of the graph of x = arcsechy are negatives of each 
other by showing 

ln (l±^Z). ln (i^Z). 

5.6. a. Sketch the graph ofx =y— ln(—y)+ k,y< 0; use A: = 2. Indicate the posi¬ 

tion of the “landmark” point (x, v) = (— \+k, — 1) on the graph. Determine 
the limiting value of x as y —> and check that your sketch reflects this 
fact. 

b. Sketch the inverse y = /jy (x) of the function in part a. What are the domain 
and the range of /y? 

c. Show that the differential equation y' =y/{y— 1) has yet another solution 

y = / 4 (x) = 0. 

d. Make a sketch of the (x,y) -plane that indicates there is precisely one so¬ 
lution of the differential equation / = y/(y— 1) through each point (x,y) 
in which y ^ 1. This sketch exhibits, visually, the general solution to the 
differential equation. 

5.7. Find the general solution of the differential equation 

dy = y 
dx y 2 + 1 

Describe the solution in words and make a sketch that reflects its salient fea¬ 
tures. 

















178 


5 Inverses 



5.8. a. Solve the following equations for x and y. 

y 


9 = arctan 


x — 1 


<p = arctan 


x+ 1 


Note: 9 and (p are called the biangular coordinates for the plane, 
b. Compute the Jacobians 

d(x,y) d{9,(p) 

d(9,<p) d(x,y)’ 

and show by direct computation that they are reciprocals. 

5.9. a. Solve the following equations for x and y: 

n = \J{x- 1 ) 2 +y 2 , n = y/(x+ 1 ) 2 +y 2 - 

Note: r\ and r 2 are called the two-center bipolar coordinates for the 
plane. 

b. Compute the Jacobians 

d(x,y) and d(ri,r 2 ) 


(x, y, z) 

= ip, 9, <p) 

X 



z 

A ( p 



/ y 


r = pcosq> 


d{r\,r 2 ) d(x,y) 

and verify directly that they are reciprocals. 

5.10. The following equations express Cartesian coordinates for space in terms of 
spherical coordinates p, 9, and (p (cf. Exercise 3.27, p. 104): 

x = p cos 9 cosip, 
y = psin0cos<p, 
z = p sinip. 

a. Solve the equations for the spherical coordinates p, 9, <p. 

b. Compute the Jacobians 

d(x,y,z) ^ d(p,9,<p) 


(9(p,e,<p) d{x,y,z) 

and verify directly that they are reciprocals. 

5.11. The following equations express Cartesian coordinates for space in terms of 

cylindrical coordinates r, 9, z: 

x = rcos9, 
y = 7-sin0, 


z = z. 
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Show that 

d(r,Q,z) = 1 = 

d(x,y,z) yj +y 2 r ’ 

and explain why this is already evident from your knowledge of polar coordi¬ 
nates in the plane. 

5.12. a. Show that the formulas that express spherical coordinates (p,0,<p) in 

terms of cylindrical coordinates in the new order (r,z,9) are identical to 
the formulas that express cylindrical coordinates in the same order in terms 
of Cartesian coordinates: 

P=f{r,z,0), r = f(x,y,z ), 

0=g(r,z,9 ), z = g(x,y,z), 

<P = h(r,z,G), 9=h(x,y,z). 

It will be sufficient to determine /, g, and h. 
b. Determine 

d(p,9,(p) , d(r,z,9) 

d(r,6,z) ana d(xyz) 

and verify directly that 

d(p,0,<p) = d(p,9,<p) d(r,9,z ) 
d(x,y,z) d(r, 9,z) d(x,y,z)' 

5.13. Use the Babylonian algorithm to determine \/T0 to 12 decimal places accu¬ 
racy. Take xo = 3; how many iterations were required? Take xo = 10; how 
many iterations are required now? 

5.14. a. To solve x 2 = a, the Babylonian algorithm first rewrites the equation as 

x = a/x and then finds iterates of the average of the left- and right-hand 
sides: g(x) = (x + a/x) /2. This suggests solving x 3 = a by iterating on the 
average ofx and a/x 2 : gi (x) = (x + a/x 2 )/ 2. Compute v / T0 by finding the 
fixed point of gi with xo = 2. Convergence is relatively slow; how many 
iterates were needed to get 8 decimal places accuracy? 

b. Convergence can be sped up by iterating on a weighted average of x and 
a/x 2 : g 2 (x) = (2x + a/x 2 )/3. Compute v / T0 by finding the fixed point of 
g 2 with xo = 2. How many iterates were needed to get 8 decimal places 
accuracy? How does this compare with convergence using gi ? 

c. Devise an effective algorithm for solving x 4 = a (that is not just the orig¬ 
inal Babylonian algorithm applied to the pair y 2 = a, x 2 = y). Use your 
algorithm to find \/ 120. 

5.15. Use the Newton-Raphson method to find (to 6 decimal places) the three real 
roots of f{x) = x 3 — 3x + 1. Sketch the graphs of y = x 3 — 3x and y = f(x) to 
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get the approximate locations of the roots to serve as initial values xg for the 
three iterations. 

5.16. Consider the linear map P : R 2 —> R 2 given by 



a. The map P : (x,y) i—> (s,t) is rotation by angle a; what is the value of a? 

b. In the (s,t)-plane, sketch the images of the x- andy-axes. According to 
your sketch, does rotation by a turn the positive 5-axis to the image of the 
positive x-axis? 

c. Now let P pull back the variables 5 and t to provide a second coordinate 
system in the (x,_v)-plane. Describe how the new (s,t) coordinate grid is 
related to the original {x,y) grid. In particular, describe the position of the 
positive 5-axis in relation to the positive x-axis. 

d. If Rq : (x,y) i—> (s,t) is rotation by the angle 9, how does Rg pull back the 
(s,t) coordinate grid to the (x,y)-plane? In particular, describe where the 
positive 5-axis appears in this pullback. 

5.17. This exercise uses the semi-log map si (cf. page 161) and the fact that si 
transforms the exponential function y = Bcfe into the linear function 


Y = (&log 10 a)x + log 10 S. 


a. Plot the US population census data for 1790-1900 on semi-log graph paper 
and verify that the points lie approximately on a straight line L. Let the 
horizontal coordinate x be years since 1790. 

b. Estimate the slope and 7-intercept of the line L. 

c. Use the estimates to obtain an exponential function y = B 1 0 kx that approx¬ 
imates the US cenus values, where x denotes years since 1790. 

5.18. Define the log-log map (cf. page 161) L : (x,y) —* {X,Y) by 



a. Show that the image of the graph of y = ax p under the map L is a straight 
line. Determine the equation of this line. 

b. Using log-log paper in which the coordinates of the lower left hand 
comer are (X,Y) = (0.1,0.1), plot the graphs of 7 = (2/3)X + 4 and 
7 = — 2X + 1. Sketch the pullbacks of these graphs in the (x,y)-plane (us¬ 
ing an ordinary uniform coordinate grid). 
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5.19. This exercise concerns the quadratic map f of Example 3 (p. 161) and the 
local coordinate change h that factors the window map Af (cf. p. 163). Write 
h as h a to reflect the fact that this coordinate change depends on the point 
a = (a,b) in the (w,v)-plane at which Af is constructed, and then write the 
input (Au,Av) of h a more simply as p = (p,q). 


a. Verify that f(u) = f(—u) for every u. 

b. Verify that h a (0) = 0, h a (—a) = — j», and h a (—2a) = 0. 

c. Show that h a fails to be 1-1 on any neighborhood of—a by showing that 
h a (—a(l +e)) = h a (— a(l — e)) for any e. 

d. As noted in the text, h a is invertible on any sufficiently small square win¬ 
dow W centered at p = 0. Give an upper bound on the length of the side 
of W. 

e. Show that h a is conformal everywhere inside W (part d) by showing the 
derivative d(h a ) p is a dilation-rotation matrix (or similarity transforma¬ 
tion, p. 118) for each point p in W. 

f. Show that the dilation factor of d(h a ) p is 

(a+p) 2 + (b + q) 2 
a 2 + b 2 

and conclude that d(h a )_ a is the zero linear map. 

g. Determine the rotation angle 9 of d(h a ) p in terms of a and p, and deduce 
that 9 > 0 when p = (p,q) is above the line q = ( b/a)p and 9 < 0 below 
it. Confirm this fact in the figure on page 164 that illustrates the action of 

h (\/3/2,l/2)- 

5.20. Let W be the infinite strip — n/ 2 < x < n/2in the (x,y)-plane; let s : W —> R 2 

be the map 

I u = sinx coshv, 
s : < 

I v = cosx sinhy. 


a. Determine the derivative ds^y Determine the Jacobian J(x,y) and show 
that J > 0 everywhere except at the two points (x,y) = (±tt/2, 0). 

b. Show that the map s is conformal (cf. Exercise 5.19) everywhere except at 
the two points (x,y) = (±tt/2,0). 

c. Show that the image of the horizontal line segment y = b is the upper half 
of the ellipse 

/ u \ 2 / v ^ 2 _j 

Vcosh b) Vsinhb/ 


d. 


if b > 0, and the lower half if h < 0. What happens if b = 0? 

Show that the image of the vertical line x = a is the right branch of the 
hyperbola 


f u ) 2 

( v ) 

Vsin a) 

Vcos aJ 
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if 0 < a < n/2 and the left branch if — n/2 < a < 0. What happens if a = 0, 
-nil, or n/21 

e. Conclude that s is invertible on W \ ( ±n/2 ,0) (i.e., W with the two points 
(±tt/ 2,0) removed), and thus defines a coordinate change there. 

f. The coordinate change s puts curvilinear (x,y) coordinates on the (n,v)- 
plane. Sketch that coordinate grid in the square \u\ < 2, |v| < 2. How does 
this grid manifest the conformality of the map s? 

g. Sketch the same curvilinear (x,y)-grid on the larger square where |u[ < 20, 
|v| < 20. On this square, the grid should look like the polar coordinate grid; 
does it? Is conformality still evident? 

5.21. Let U be the right half-plane u > 0, and let h : U —> R 2 be the map 



a. Show that the image h(£7) is the first quadrant Q : x > 0,y > 0. 

b. Find the inverse h 1 on Q to show that h is a coordinate change. 

c. Determine the Jacobians d(x,y)/d(u,v) and d(u,v)/(d(x,y), and show 
that they are reciprocals. 

d. Sketch the curvilinear (u, v)-coordinate grid on Q, and the curvilinear 
(x, y)-grid on U. Is the map h conformal? 

5.22. Consider the map m : K 2 —> R 2 defined by 


p = e 4 ’cosht, 
q = e 1 sinht. 


a. Determine the image M = m(R 2 ) in the (p, g r )-plane, and sketch the curvi¬ 
linear (s, t )-coordinate grid there. 

b. Determine the inverse m 1 on M to show that m is a coordinate change. 
Sketch the curvilinear (/»,^r)-grid in the (s,t)-plane. 

c. Determine the Jacobians d(p,q)/d(s,t) and d(s,t)/(d(p,q), and show 
that they are reciprocals. 

d. You should notice similarities between the maps h and m of the previous 
exercise and this one. Show that the coordinate changes 


u : 


u = e s , 
v = t; 


P = (y+x)/2, 
q = (y-x)/2\ 


convert h into m. That is, show m = p o h o u, and sketch all these maps 
together. 
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5.23. Let U be the upper half-plane y > 0 and let a : U —► R 2 : (x,y) 

the map defined by the biangular coordinates (cf. Exercise 5.8): 

9 = arctany/(x — 1), 

(p = arctany/(x± 1). 

a. Show that the image a (U) is the triangular region 

0 <9 <n, 

' 0 <(p <9. 

Conclude that a : U —> T is a coordinate change. 

b. Identify, within T, the images of the lines x = ± 1. Also indicate the limit 
of a(x,y) as y —> 0 and (i) x < — 1; (ii) -1 < x < 1; and (iii) 1 < x. 

c. The curvilinear (x,y)-coordinate grid in T is shown in the margin. Identify 
the grid lines x = const, and y = const., and indicate how x and y vary 
through the grid. Indicate how the grid illustrates the limits you determined 
in the previous part. 

d. Referring to the curvilinear (x,y)-coordinate grid, indicate the geometric 
action of a -1 on T. That is, indicate how a -1 “opens up” T to become the 
upper half-plane. Does a -1 reverse orientation? How is the answer to this 
question indicated in the geometric action? 

e. Draw the (0,<p)-coordinate grid in the (x,y)-plane. (This is, in fact, easy 
to do; do you see why?) 

5.24. Let U be the upper half-plane y > 0 and let b : L/ —>• M 2 : (x,y) —* {r\,r 2 ) be 

the map defined by the two-center bipolar coordinates (cf. Exercise 5.9): 

n = \/( x ~ l ) 2 +T 2 ) 

r 2 = y/(x+ 1 ) 2 +J 2 - 

a. Show that r\ and r 2 satisfy the inequalities 

2 <ri+r 2: —2 < r 2 — r\ <2. 

This defines a “half-infinite” strip S in the (n ,r 2 )-plane; b(C/) = S. 

b. Explain why the map b : U —> S is a coordinate change. It follows that the 
(image of the) Cartesian (x,y)-grid defines curvilinear coordinates in S. 
Sketch this curvilinear grid. Are the curvilinear grid lines perpendicular? 

c. Sketch, in U, the curvilinear coordinate grid defined by r\ and r 2 . (This is 
easy to do.) 

d. The map b is well defined on the x-axis. Sketch the image of the x-axis in 
the (n,r 2 )-plane; note in particular the images of the points (±1,0). How 
is the image related to SI 




(9,<p) be 
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5 . 25 . The following map a : U 4 —> R 4 : (r,t) —> x defines the analogue of spherical 
coordinates on R 4 : 

x\ =rcosti COSt2 C0St3, 

X2 = rsinfi cos?2 cost3, 

X3 = r sinh cos fa. 

X4 = r sin ty, 

a. Describe the image V 4 = cr(£/ 4 ). 

b. Obtain the derivative ds ( ,. t ) and show that det(ds () . t )) = r 3 cos 12 cos 2 1^. 

c. Deduce that a is locally invertible everywhere in F 4 . 

d. Find a formula for the (global) inverse of ff on f 4 . 



0 < r, 

_ K <C t\ <C 7 t, 

' -tt/2 < t 2 < n/2 , 
— tt/2 <t 2 < n/2. 


Chapter 6 

Implicit Functions 


Abstract Given a relation between two variables expressed by an equation of the 
form f(x,y) = k, we often want to “solve for y.” That is, for each given x in some 
interval, we expect to find one and only one value y= <p(x) that satisfies the rela¬ 
tion. The function <p is thus implicit in the relation; geometrically, the locus of the 
equation f(x,y ) =k is a curve in the (x, y)-plane that serves as the graph of the func¬ 
tion y = <p(x). The problem of implicit functions—and the aim of this chapter—is 
to determine the function (p from the relation /, or at least to determine that <p exists 
when its exact form cannot be found. There are analogues of this problem in all di¬ 
mensions; that is, x and y can be vectors, and the relation f{x,y) = k can expand into 
a set of equations. However, we begin our analysis with a single equation, because 
the various impediments to finding the implicit function already occur there. 


6.1 A single equation 

Perhaps the most familiar example of an implicitly defined function is provided by 
the equation f(x,y) = x 2 +y 2 . The locus f(x,y ) = k is a circle of radius \fk if k > 0; 
we can view it as the graph of two different functions, 

y = <p± (x) = ± yk — x 2 . 

But if k < 0, the locus is the empty set; there is no implicit function at all. This is 
the first impediment: there may be no pairs (x,y) whatsoever that satisfy the relation 
f{x,y ) = k. We need to know, somehow, that the relation is nonempty; that is, there 
is at least one point ( a , b) for which f(a, b) = k, so <p(a) = b. Think of this point as 
a kind of “seed” from which the function = <p(x) can “grow.” 

For example, when x 2 +y 2 = k > 0, we can take either (0, +y/k) or (0, — s/k) as 
a seed; then tp + grows out of (0, Vk), and <p_ grows out of (0, — Vk). This example 
calls attention to the fact that we must expect the implicit function to be local, that is, 
to be defined only on part of the locus f(x,y) = k. Different parts of the locus may 
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Points on the locus 
that are not seeds 


y 



\ / 
x 2 -y 2 = 0 


A “flat” function has 
a 2-dimensional locus 


therefore be graphs of different implicit functions. Any point of the form ( a,b) = 
(a,+Vk — a 2 ), with —Vk < a < Vk would serve equally well as a seed for (p + ; 


likewise, any point (a, 

y 

y = <P+(-^— : 

— Vk— a 2 ) would serve for (p_. 

y 

(0, Vk) 

^X 2 + J 2 = 

\(Vk, 0) 

y = 

lx l 

(0, -Vk) 



This leaves only the points (±Vk, 0). As the figure on the right shows, (Vk,0) 
cannot be a seed for a function of x, because the circle has no y-value at all when 
x > \/k and gives two y-values near y = 0 when x < Vk and x is arbitrarily close 
to Vk. There is a similar problem for (—Vk, 0). (Of course, (Vk, 0) serves perfectly 
well as a seed for a function x = i/r(y), but we concentrate on x as the independent 
variable for the moment.) In a different way, there is no seed when k = 0. Certainly 
there is a point on the locus—namely (a,b) = (0,0)—but nothing can grow out of 
it, because the entire locus x 2 +y 2 = 0 is just this single point. 

Although there is nothing wrong with having two different parts of the locus be 
the graph of two different implicit functions, we do require that only one implicit 
function (p should be able to grow out of a given seed on that locus. This is a sig¬ 
nificant restriction, and places yet another impediment in the way of obtaining (p. 
We can illustrate the problem with the quadratic equation f(x,y) =y 2 —x 2 = 0. The 
locus is a pair of lines that cross at the origin. Hence, we find that four different 
implicit functions grow out of a seed at the origin: 

<pi(x)=x, (p 2 (x) = x, <p 3 (x) = |x|, <p 4 (x) = -|x|. 

Nothing in either the locus or the seed indicates which of these we should choose; 
therefore we have failed to determine the implicit function we seek. 

The same problem appears in an even more exaggerated form when the locus is 
not a curve but is a full 2-dimensional region, such as the unit disk D : x 2 +jV < 1. 
This is what happens for the “flat” function that is defined by the formula 


f(x,y ) = the square of the distance from (x,y) to D 


0 ifx 2 +V < 1, 

(Vx 2 +y 2 — l) 2 otherwise. 


The graph of / is flat on D, so the locus f(x,y) = 0 is D itself; see below. The 
points at zero distance from D are precisely the points of D. (Because / measures 
the square of the distance to D, it is differentiable everywhere, including on the 
boundary of D.) It is clear that the graph of any continuous function y = <p(x) that 
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grows out of a seed (a, b) in the interior of D will lie in D, at least for x sufficiently 
near a. 



Notice what is true about the tangent to the locus f{x,y) = k at a putative seed 
point (a,b) in each of our examples thus far. When there was no uniquely de¬ 
fined <p(x), either there was no tangent (x 2 +y 2 = 0), there was more than one 
tangent (both x 2 — y 2 = 0 and the function vanishing on the unit disk), or there 
was a vertical tangent (x 2 + y 2 = k at (± \fk, 0)). We were successful only when the 
locus had a single nonvertical tangent at the seed point. It seems reasonable, then, 
to conjecture that this is a sufficient condition for the existence of a unique implicit 
function of x. 

Unfortunately, a single nonvertical tangent is not enough. Consider the locus 
y 2 — x 4 = (y — x 2 )(j + x 2 ) = 0. It is the union of the two parabolas y = x 2 andy = 
—x 2 , and it has a single nonvertical tangent at every point, including the origin. 
Nevertheless, there are still four different implicit functions that grow out of the seed 
at the origin; the figure shows one of them. (Of course, every other point ( a,±a 2 ), 
a ^ 0, on the locus is the seed for a unique implicit function.) 

To revise the conjecture, let us make use of the fact that an implicit function 
is a local object. Then we can search for it with basic tools of local analysis, in 
particular, with Taylor’s theorem. Thus, if we suppose that f(x,y) has continuous 
second derivatives in a neighborhood of (a, b ), its first-order Taylor expansion is 


f(x,y) = f(a,b)+f x (a,b)Ax+f y (a,b)Ay + 0(2) 

in a window centered at (a,b); Ax = x — a and A y = y—b are the usual window 
coordinates. Because f(a,b) = k, the equation of the locus /(x,y) = k reduces to 


f x (a,b) Ax+f y (a,b) Ay + 0(2) = 0 

in the window. That is, the terms after f(a,b) in the Taylor expansion must sum to 
zero. Within the window, this is the equation of the locus. If we delete the higher- 
order term 0 ( 2 ), the remaining equation is called the linearization of the locus at 

(a,b): 

f x (a,b)Ax + f y (a,b)Ay = 0. 

This is a linear equation; if at least one coefficient f x (a,b), f y (a,b) is nonzero, it is 
the equation of a straight line, the tangent line to the locus at (a, b). Furthermore, if 
f y (a,b) ^ 0, we can solve the linearized equation for Ay, 


A conjecture 


y 



\ / 
x 2 -/ = 0 


Linearizing the locus 
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Ay 


fx(a,b) 

fy{a,b) 


Ax, 


implying that the tangent has finite slope m = —f x (a,b)/f y (a,b). 



The nature of 
the linearization 


The linearization 
determines the 
implicit function 


In the figure above, the locus f(x,y) = k has been linearized at two different 
points (a,b), with fundamentally different results. At the lower right, the lineariza¬ 
tion has finite slope and implicitly defines the linear function Ay = mAx. In particu¬ 
lar, f y {a,b) ^ 0. Furthermore, it is evident that the locus itself determines a unique 
implicit function y = <p(x) whenx is near a and <p(a) = b. We can even see that <p is 
differentiable at x = a and 


<p'(a) 


= m = — 


fx(a,b) 

fy(a,b) 


fx(a,<p(a)) 

fy(a,<p(a))' 


Now compare this to what happens at the second point, in the upper left. The 
linearization there is the vertical line Ax = 0. This means that f y (a,b) = 0, and the 
linearization is not the graph of any implicit (linear) function of the form Ay = mAx. 
Likewise, no implicit function of x can grow out of the seed point (a,b) on the 
original locus. 

According to our evidence, the condition f y (a,b) ^ 0 guarantees that the lin¬ 
earized locus determines a unique implicit function of x, and that is enough to ensure 
that the original locus does too. The evidence also suggests that we can connect the 
derivative of <p (where it exists) to the partial derivatives of /, extending the formula 
for (p'(a) we found above. Just differentiate the identity k = f(x,<p(x)) using the 
chain rule to get 


0 = ^/(x,<p(x)) =f x {x,(p{x))+f y (x,(p(x))(p\x). 

Because f y is continuous by hypothesis, the condition f y (a, b) ^ 0 implies f y {x,y) ^ 
0 for all (x,y) sufficiently close to (a,b). This allows us to solve for (p'(x): 
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f x (x,<p(x)) 

fy{x,(p{x))' 
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Theorem 6.1 (Implicit function theorem). If f (x.y) has continuous first deriva¬ 
tives in a neighborhood of the point ( a,b), and f[a,b) = k, f y {a,b) f 0, then there 
is a unique function y = <p(x) defined and continuously differentiable on an open 
interval I containing a for which 

• f(x,(p(x)) = kfor all x in I. 

• (p(a) = b. 

• <p'( x ) = - fx{ ?' (p( f ) } for all x in I. 

fy{X,(p(X)) 

Before we prove the implicit function theorem, let us take a closer look at the 
condition f y (a,b) f 0. It expresses our informal conjecture that the locus f(x,y) = k 
should have a single nonvertical tangent at the seed point {a, b), but it is both more 
precise and more restrictive. For example, although the locus f(x,y) =y 2 —x 4 = 0 
(p. 187) appeared to have a single horizontal tangent at the origin, we find f x (0,0) = 
f y { 0,0) = 0, so the linearized locus is not a horizontal line; it is not a line at all. We 
call the origin a critical point of /. 

Definition 6.1 We say (a, b) is a critical point of the differentible function f(x,y) 
tf fx{a,b) = f y (a,b) = 0. If either partial derivative is nonzero, we say ( a,b) is a 

regular point of f. 

The implicit function theorem rules out any critical point of / as a seed. Indeed, 
in most of the problematic examples that led to our original conjecture, we were 
attempting to make a critical point be a seed. 

So suppose (a,b) is a regular point of the function z = f(x,y). Either f y (a,b) f 0 
and the locus f{x,y) = f{a , b) is the graph of a differentiable function ofx near (a, b) 
(by the implicit function theorem), or else f x {a, b) f 0 and, switching the roles of x 
and y, we see the locus is the graph of a differentiable function of y. In either case, 
the locus is the graph of some differentiable function and is thus a differentiable 
curve near (a,b). 

Definition 6.2 If [a, b) is a regular point of the continuously differentiable function 
f(x,y), and f(a, b) = k, then we say ( a , b) is a regular point of the curve f(x,y) = k. 
If all points on the locus are regular, we say the curve itself is regular. 

The locus f(x,y) = k is one of the level sets, or contours, of /. At a regular point 
of a contour, at least one of the partial derivatives of / is different from zero. By con¬ 
tinuity, that derivative remains nonzero at all sufficiently nearby points. Therefore, 
near the given regular point, all contours are regular. The following theorem says 
even more: it says that a suitably chosen coordinate change will “straighten out” 
those contours. This implies that there is essentially only one way to arrange the 
contours near a regular point. It also leads to a quick proof of the implicit function 
theorem. 


No seed at 
a critical point 


Near a regular point, 
the locus is a curve 
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Straightening 
level curves with a 
nonlinear shear 


Theorem 6.2. Suppose ( a,b) is a regular point of a function z = f(x,y) that has 
continuous first derivatives. Then there is a coordinate change (w, v) = h(x,_y) de¬ 
fined on a window centered at ( a,b) that transforms the level curves of f into the 
coordinate lines v = constant. 


Proof. At least one of f x (a,b), f y (a,b) is nonzero; suppose f y (a,b) 0. Define h 
by the formulas 


u = x, 

v = f(x,y). 


(If f y (a,b) = 0, then f x (a,b) fi 0; set u = y instead; see Exercise 6.10.) Because / 
has continuous first derivatives, so does h. Moreover, 


dh M = 


(l 0 \ 

\fx{x,y) f y {x,y)J ’ 


so detdh( a fi ) =f y (a,b) 0, implying that dh( a ^ is invertible. By the inverse func¬ 
tion theorem (Theorem 5.2, p. 169), h has a continuously differentiable inverse de¬ 
fined on a neighborhood of h (a,b) = (a,f(a,b)) = (a,k). Thus h is a valid co¬ 
ordinate change near (a,b). Because v = f(x,y), h transforms each level curve 
f(x,y) = A into the coordinate line v = A. □ 




Thus, near a regular point, the level curves of a real-valued function are part of 
a curvilinear coordinate system; near that point, those curves are always roughly 
parallel and evenly spaced. We can see this in the figure above, which shows a 
coordinate change that straightens the levels of f(x,y) = y 2 — 4x 2 (2x — 1) near the 
point (a,b) = (0.18,0.417); k = 0.35. (Note: The origin is not at the intersection of 
the coordinate axes in either figure.) The coordinate change h is a nonlinear shear. It 
maps each vertical line to itself; this is the geometric meaning of the equation u = x 
in the definition of h. Each vertical line just slides up or down, stretched by different 
amounts at different points. At the point (a, /3) on x = a, the vertical stretch factor 
is f y (a.j5). For example, at points in the lower half of the figure above, we can see 
that the stretch factor is less than 1 (though still positive), because h shrinks vertical 
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distances there. You can analyze another example, with a simpler function f(x,y), 
in Exercise 6.8. 

We take up now the proof of the implicit function theorem, Theorem 6.1. Let us 
write the inverse of h in terms of components: 

h _i . (x = u, 

' \y = g(u,v). 

The first component is just the first component of h itself. The second component, 
g, is a continuously differentiable function of u and v. The inverse relation between 
h 1 and h implies 

(x,y) = h“'(h(x,y)) = h~ l {x,f{x,y)) = (x,g(x,f(x,y)), 

and, in particular, y = g(x,f(x,y)). Therefore, if v = f(x,y) = k, then y = g(x,k). 
This is the implicit function we seek: (p(x) = g(x,k). □ 



The idea behind the proof of Theorem 6.1 is that it becomes easy to find the 
implicit function if we use the right coordinate system. We choose coordinates 
(x,y) i—> (k,v) in which the level curves of f(x,y) become coordinate lines v = A. 
Then the implicit function defined by f(x,y) = k is just the constant function v = k 
in the new coordinates, and the inverse coordinate change (m,v) i—> (x,y) converts 
this line v = k into the graph of a (generally nonlinear) function y = <p(x). 

Underlying the proof is a basic principle we have used several times: coordinates 
are just labels for points, and we should choose labels that make the geometry most 
intelligible. In this setting there are two fundamental objects, both geometric: the 
first is the plane IP and its constituent points p; the second is a real-valued map 
jF : fP - * 1:: p > - ( J (p) defined on fP. To describe that map, we introduce analytic 
tools: coordinates (x,y) to label the points, and the appropriate expression f(x,y) to 
represent jF: 


Proof of the implicit 
function theorem 


Making coordinate 
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Geometry underlies 
analysis 
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Coordinates as 
languages 


The geometric view 


Geometry simplifies 


analytic level: R 2 R 

T II 

geometric level: IP —> R 

In practice, we start at the analytic level. Conceptually, though, it helps to let the ge¬ 
ometry come first; the analysis is then overlaid as a language with which to describe 
it. For example, equations such as f(x,y) = k are a way to describe level curves of 
the more fundamental geometric map p i—> 7 { p). 

We can take the language analogy further. Just as the world of objects and ideas 
is described by a variety of human languages, the geometry of points and maps 
can be described by a variety of coordinate systems and analytic expressions. Two 
human languages are connected by a pair of translation dictionaries; the geomet¬ 
ric analogue is a coordinate change. The following diagram shows how the two 
coordinate systems we used to analyze the level curves of 7 are related by the 
coordinate change h. In the diagram, “v” stands for a coordinate function as well 
as a coordinate ; as a function, it assigns to the ordered pair (m,v) the number v. 

R 2 ---- R 2 0 ,y) ---- (u,v) 




Before we resume our work on the implicit function theorem, let us pause to 
recall a couple of places where the geometric view has already come to the fore. 
One was in the study of 2 x 2 matrices. We viewed them as certain maps of the plane 
that are characterized geometrically (Theorem 2.6, p. 40) by their eigenvalues and 
eigenvectors. Like / and v in the diagram above, different matrices can represent the 
same geometric map. We defined two matrices to be equivalent if a linear coordinate 
change would convert one into the other, and we saw that equivalent matrices had 
the same geometric action. Another place where we ended up with a geometric view 
was with the inverse function theorem. According to Corollary 5.4 (p. 176), which 
was suggested by our work in the example on pages 161-165, a nonlinear map 
f : U n —» R" looks like its linear approximation df a near any point a where the linear 
approximation is invertible. 

So geometry helps us simplify, and it does so by “lumping together” things (such 
as equivalent matrices) that we would otherwise treat as distinct. In the diagram 
above, the coordinate change h that converts / into v allows us to “lump together” 
those two functions, and therefore to say that / is essentially a coordinate function. 
The simplification is this: near a regular point, any real-valued function is essentially 
just a coordinate function. From this observation we then get, first, the structure of 
the level curves, and second, the implicit function theorem. 


(x,y) —* ^ 
T II 
7 

p —> z 
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The move from two variables to three is straightforward. The meaning of a regu¬ 
lar point of a function or of a locus carries over in a natural way, as does the geomet¬ 
ric viewpoint generally. The following theorem and corollary are the main results. 
Their proofs follow the proofs of the two-variable versions (see the exercises), and 
also follow from the u-variable versions, below. 


Theorem 6.3. Suppose ( a 7 b 7 c ) is a regular point of a function s = f(x,y,z ) that has 
continuous first derivatives. Then there is a coordinate change (u 7 v 7 w) = h (x,y,z) 
defined on a window centered at {a 7 b 7 c) that transforms the level sets of f into the 
coordinate planes w = constant. □ 

Corollary 6.4 (Implicit function theorem) Suppose the function s = f(x,y,z ) has 
continuous first derivatives in some open neighborhood of a point ( a,b,c ), and 
f{a 7 b 7 c) = k. If f z (a,b,c) 0, then there is a unique function z = (p(x,y ) defined 
on an open neighborhood N of (a, b) for which 

• f(x,y,(p(x,y)) = kfor all (x,y) in N. 

• <p(a,h) = c. 

• <p has continuous first derivatives on N, and 


<Px{x,y) 


fx{x,y,q>{x,y)) 
fz(x,y,(p{x,y)) ’ 


<Py(x,y) 


Mx,y,q>(x,y)) 
fz(x,y,(p(x,y))' 


□ 


As stated, the corollary connects the condition that the partial derivative of / 
with respect toz is nonzero to the conclusion that z depends on x and y near ( a,b,c ). 
However, if instead it is the partial derivative with respect to either y or x that is 
nonzero, we get the same conclusion mutatis mutandis (“the necessary changes be¬ 
ing made”). Corollary 6.4 thus stands for three different statements; for example, if 
f y (a 7 b 7 c) f 0, theny = yf(x,z) for some y/ and 


Vx(x,z) 


f x (x 7 y/(x 7 z) 7 z) f z [x 7 y/(x 7 z),z) 

fy(x,W(x,z),z)’ ' ’ fy{x 7 y/{x 7 z) 7 z)' 



Even though the theorem and corollary above settle our questions about a func¬ 
tion of three variables, it is still valuable to look at an individual level set and its 


Regular points with 
three variables 


Different implications 
of the corollary 


An example 
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linearization at a point, as we did above with a function of two variables. We see 
above, left, the zero level of 


Linearization and 
the tangent plane 


f(x,y,z) =x 2 (x- 1 )+y 2 +z 2 ; 


on the right, we get some idea how the zero level is nested within the collection of 
nearby level sets. (Parts of some levels have been “peeled away” to help see inside.) 

We expect the locus f{x,y,z) = k to be 2-dimensional, although it may fail to 
be a proper surface at one or more of its points. That is, a point may fail to be a 
regular point of the locus. This is what happens to f(x,y,z) = 0 at the origin, where 
it has the shape of the vertex of a cone: no coordinate change can convert the vertex 
into a simple plane. (In this example, however, every nearby level set contains only 
regular points of /, at which Theorem 6.3 applies.) In general, a locus f(x,y,z) = k 
can exhibit all the irregularities that f{x,y) = k did, and many more besides (see 
Exercise 6.13). 

By considering the linearization of s = f(x,y,z), we can see once again what 
prompts the partial derivative condition that leads us to an implicit function. We 
begin by assuming, as before, that / has continuous partial derivatives. Then the 
first-order Taylor expansion of / at a seed point (i.e., f(a, b , c) = k ) is 


f(x,y,z) = f(a,b,c) + f x (a,b,c)Ax + f y (a,b,c)Ay + f z (a,b,c) Az + 0(2) 
in a window centered at (a, b, c). We take 

f x (a,b,c) Ax + f y (a,b,c) Ay + f z (a,b,c) Az+ 0(2) = 0 
to be the equation of the locus f(x,y,z) = k in that window, and 


f x {a,b,c)Ax + f y {a,b,c)Ay + f z (a,b,c) Az = 0 

to be the linearization of that locus. If (a,b,c) is a regular point of /, then at least 
one of the coefficients is nonzero, and the equation describes a plane. Because the 
difference between the locus and its linearization at (a,b,c) vanishes at least to 
second order in (Ax, Ay, Az), the plane is tangent to the original locus at (a,b, c). 



tangent plane: 

f x (a, b, c) Ax + f (a, b, c ) Ay + f z {a, b, c) Az = 0 


locus: 
f (x, y, z) = k 


More particularly, if f z (a,b,c)^ 0, then the linearization implicitly defines the linear 
function 


fx(a,b,c) 

fz{a,b,c) 


Ax — 


fy{a,b,c) 

fz(a,b,c) 


Ay, 
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and this is another version of the equation of the tangent plane at (a,b,c). The anal¬ 
ogy with the two-variable case means that the implicit function z = (p(x,y) has this 
equation as its linear approximation near (x,y) = (a, b), and 


<Px(a,b) 


fx(a,b,c) 

fz{a,b,c )’ 


‘ P.v(a,b ) 


fy(a, b,c) 
f z (a,b,c )' 


Finally, let us suppose the number of variables xi,X2,...,x„ is arbitrary, with 
a single relation f{x\,...,x n ) = k holding between them. Then we expect one 
of the variables, say x n , to depend on the others, implying there is a function 
x„ = <p(xi,... ,x n -\). The graph of (p is an (n — 1)-dimensional hypersurface in R". 
Nearby level sets f{x\ ,...,x„) = A, for A near k, should be nested hypersurfaces 
that together fill a portion of R". These expectations are borne out at a regular 
point of f{x), that is, at a point {a\,...,a n ) where at least one partial derivative 
df/dx i (ai,... 1 a„) is nonzero. 

Theorem 6.5. Suppose the function f : X n — > R : x i—» f(x) has continuous first 
derivatives on X'\ and x = a is a regular point of s = /(x). Then there is a coordi¬ 
nate change u = h(x) defined on a window W n centered atx = a that transforms the 
level sets of f into the coordinate hyperplanes u n = constant. 

Proof. Because a is a regular point of/, at least one of the partial derivatives /(a) 
is nonzero. (We define f to be the partial derivative of / with respect to the i- 
th variable, x/.) By permuting the variables Xj, if necessary, we may suppose that 
f n (a) / 0. Define h : X" —> R" : x i—> u by 


Regular points 
with n variables 


«i =xi, 


h : 


M«-l — X H _ 1, 

U n =f{x 1 ,...,X„). 


Then h is continuously differentiable on X n because / is, and 

/ 1 . . . 0 0 Nj 


dh x = 


0 1 0 

V/i(x) l(x) /„(x)/ 


Therefore detdh a = /„ (a) / 0, and the inverse function theorem implies h is invert¬ 
ible on some neighborhood W n of a. The coordinate change h transforms the level 
set /(x) = A into the coordinate hyperplane u„ = A. □ 

Corollary 6.6 (Implicit function theorem) Suppose the function s = f(x .\,... ,x„) 
has continuous first derivatives on some neighborhood of a point (ai,...,a n ), 
and /(ai,..., a n ) = k. If f n (a\,... ,a n ) 0, then there is a unique function x n = 
<p(x i,... ,x„_i) defined on an open neighborhood N n ~ i of(a\,... ,a„_i) for which 
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• f(xi,---,x n -i,(p{xi,...,x n -i))=kforall(x u ...,x n -i)inN n \ 

• (p(ai,...,a n -i) =a n . 

• <p has continuous first derivatives on A" -1 , and for i = 11, 



(pi(x i) 


fi(xi,...,x„-i,<p(xi,...,x„-i)) 
f„(xi,... ,x„-l,(p(xi,... 


Proof. Let h be the coordinate change in Theorem 6.5; because it is the identity 
on the first n — 1 coordinates, it is a nonlinear shear that maps each vertical line 
(xi,...,x„_i) = to itself. Its inverse must do the same, and thus has 

the form 

' x\ = Ml, 


h -1 : < 


x n „i — u n — i, 

X„ =g{u\,...,u n ). 


h lt h 



Here g is a real-valued function with continuous derivatives on a neighborhood P" 
of the image point h(a). Let P n 1 be the intersection of P" and the horizontal plane 
u n = k. Because h and h 1 preserve vertical lines, it is convenient to put the target 
space (mj ,..., u n ) directly below the source. Also, for visual clarity, P" is shown as 
the sheared image of a box W n centered at a. 

Because h 1 o h is the identity, we can write 

(xi,...,x„_i,x„) = h _1 (h(xi,...,x„_i,x„)) 

= \r l (ui,...,u n -i,f(xi,...,x„-\,x n )). 

Now assume that the point (u\ ,..., u n -\ ,/(xi ,... ,x„_i ,x„)) is inP" -1 . In particular, 
this means f(x ,\,... ,x„) = k, and we can write 


(xi,...,x„_i,x„) = h *(«!,. ..,u n -i,k) 

= (xi,... ,x„_i,g(xi,... ,x„-\,k)). 


The figure makes it clear that IV" -1 (which we must still define) and P n ~ l are in the 
same vertical column. Thus we make N n 1 the projection of P" 1 to the coordinate 
plane x„ = 0; that is, (xi,... ,x„_i) is in A” -1 if and only if (xi,... ,x„_i ,k) is in 
P”~ l . Finally, if we set <p(xi,... ,x„_i) =g(xi,... ,x„- \,k) when (xi,... ,x„^i) is in 
A" -1 , then 

/(xi,...,x n ) =k <=3- x„ = <p(xi,...,x„_i) 

and <p (a i,..., a„- 1 ) = g(ai,..., a M _ i, ^) = a«. The expressions for the partial deriva¬ 
tives of (p follow from the chain rule applied to 


k = f(xi,... ,x n -i,<p(xi,... ,x„_i)); 

we find 0 = fi+f„-(Pi, from which it follows that <p, = —f/f n - □ 
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As it is written, the implicit function theorem assumes that /„(a) / 0 at the seed 
point a. But suppose /„(a) = 0; the theorem still holds, mutatis mutandis, if some 
other partial derivative is nonzero there. For example, if fj (a) / 0, then we can solve 
for xj in terms of the other variables to get a function 

Xj = yr{xi,...,xj,...,x„). 


Here the circumflex is used to indicate that the variable Xj is missing from the list. 
The theorem implies ..,dj,...,a n )= aj, and 


y/j(xi,...,xj,...,x„) = - 


fix i,. ■ ■, \jf(xi ,x„), ... ,x„) 

fj{x l , • • • , V(x 1 ,...,Xj,... .. ,x„) 


for every i = 1 

Suppose z = /(x) has continuous second partial derivatives at a, allowing us to 
write the first-order Taylor expansion of / (in terms of window coordinates) at a: 


Az = /(a +Ax) —/(a) = /i(a)Ax] H-h/„(a)Ax„ + 0(2) 

= d/ a (Ax)+ 0(2). 


If a is a regular point of /, then it follows from Theorem 6.5 that there are new curvi¬ 
linear coordinates in which the higher-order terms 0(2) disappear: / is transformed 
into precisely its linear approximation d/a near a. The details are in the following 
corollary, which incidentally is stronger than Taylor’s theorem because it requires 
only continuous first partial derivatives for /. 

Corollary 6.7 Suppose x = a is a regular point of the continuously differentiable 
function z = /(x). Then there is a coordinate change Av = g(Ax) in a window cen¬ 
tered at a for which 

Az = /(a + g“’(Av))-f(a) = d/ a (Av) = /i(a)Avi + • • •+/„(a) Av„. 


Proof. Express the coordinate change h that is provided by Theorem 6.5 in window 
coordinates Ax centered at a and Au centered at . ,a„_i,/(a)) (so that Au = 
h(Ax) and h(0) = 0): 


' Aii{ = Ajci , 


( 1 ' 

•• 0 0 \ 

A ii j . _j — A ,._j, 

, Am« =/(a + Ax) —/(a); 

dh 0 = 

0 • 

\/i ( a ) • 

1 0 

••/«-!(a) fni a)/ 


In terms of these coordinates, / is already transformed into the simple linear func¬ 
tion Az = A u n . That is, 


“Symmetrizing” the 
implicit function 
theorem 


Near a regular point a, 
/ looks like d/ a 


Az = /(a + h ’(Au)) — /(a) = Au„. 
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Near a regular point, 
/ becomes a 
coordinate function 


In general, the locus 
is a space curve 


Now consider the linear map Au = dho(Av), whose nth component function is 
Au n =/i(a)Avi 4-h/„(a)Av„. 

Thus, if we set g = dh (| 1 o h (so that g 1 = h 1 o dho), then g is a valid change of 
window coordinates in a window centered at a, and we get 

Az = /(a + g _1 (Av))-/(a) 

= /(a + h~ 1 (Au)) —/(a) 

= /i(a)AvH-h/„(a)Av„. □ 

Although there is a certain “fitness” to showing that a function can be trans¬ 
formed exactly into its derivative at a regular point, it is useful to know that it can 
also be transformed into a simple coordinate function. In other words, there is a 
curvilinear coordinate system in which one of the coordinates is just the value off 
there. This result, stated in the next corollary, has already been demonstrated in the 
last proof. 

Corollary 6.8 Suppose x = a is a regular point of the continuously differentiable 
function z = /(x). Then there is a coordinate change Au = h(Ax) in a window 
centered at a for which 


Az = /(a + h ’(Au)) — f(a) = Au n . 


□ 


6.2 A pair of equations 

We now suppose that two separate conditions have been imposed on our variables: 
f(pci,...,x n )=k, g(x u ...,x n )=l. 

Because we expect these conditions will allow us to solve for two of the variables 
in terms of the remaining ones, it is natural to assume that n > 2, that is, that there 
are more variables than conditions. However, if we set aside the matter of implicit 
functions for the moment, and just consider the geometric implications of the two 
conditions, there is no reason to exclude n = 1 or 2. We take up this possibility in 
the last section (cf. pp. 214ffi). To begin, however, we assume n = 3; this is the most 
complicated case that we can visualize fully. 

Thus, we are dealing with two conditions f(x,y,z) = k and g(x,y,z) = l on three 
variables. We expect the locus / = k to be a 2-dimensional surface Sf in R 3 , and 
g = l to be another such surface, S g . Of course, either of these could fail to look like 
an ordinary surface at one or more points; for the moment, though, let us assume 
they are completely regular. The locus determined by two conditions together is the 
intersection Sf n S g - We expect the intersection of two surfaces to be a curve in 
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space, but it may not be, even when the surfaces themselves are regular; see the 
counterexamples below. 



For simplicity, let us assume the intersection locus is, indeed, an ordinary space 
curve. We should then be able to describe it parametrically: 

: (. x(t),y(t),z(t )), t x <t< t 2 . 

However, we should not be introducing a new variable t. In the spirit of implicit 
functions, we should, instead, try to express two of x, y, z in terms of the remaining 
one, and involve no new variable at all. To recover these implicit functions from 
the parametrization, let us suppose that x = x(t) is invertible, perhaps for t in some 
smaller interval. If t = t(x) is the inverse (on x\ < x < x 2 ), then 

{x{t),y{t),z(t)) = (x,y( t(x)),z(t(x))) = (x,tp(x), y/(x)). 

If Sf IT S g has more than one point with a given x-value (as in the figure above), we 
cannot parametrize all of it by x. Restricting the values of x as necessary, we can 
solve fory and z in terms of x: 

f(x,y,z)=k, y=(p (x), 

, , , , , X! <x<x 2 . 

g[x,y,z) = l, Z= t//(x), 

Although we can expect x(f) to be invertible on only a part of the locus S ff^Sg, on 
a different part we may find that y(f) is invertible. On that subset we can solve for x 
and z in terms of y; on a subset where z(t) is invertible, we can solve for x andy in 
terms ofz. To summarize: two conditions on three variables implicitly define two of 
the variables in terms of the third. 

Before we discuss precise conditions that allow us to determine those implicit 
functions, let us see how two ordinary surfaces can fail to intersect in a curve. Thus, 
we suppose ( a , b , c) is a regular point on each of surfaces, 

Sf.f(a,b,c)=k , Sg : g(a,b,c) = 1. 

This means each locus has a well-defined tangent plane at (a, b, c) ; moreover (Theo¬ 
rem 6.3, p. 193), each locus can be locally transformed into a flat plane by a suitable 
coordinate change. For these reasons we can regard the locus as a regular surface 
near ( a,b,c ). 


Solving for two 
of the variables 


Faulty intersections 
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Transversality and 
general position 




In each of the following three examples, (a,b,c) is the origin. Moreover, each 
surface is the graph of a continuously differentiable function, so every point is reg¬ 
ular. The first example is 


Sf : x 2 +y 2 — z = 0, S g '■—x 2 —y 2 —z = 0. 

The surfaces are parabolic bowls that meet only at the origin; SfC\ S g is a point, not 
a curve. The second example is 

Sf : x 2 +y 2 — z = 0, Sg.z = 0. 

This time the intersection is a pair of crossed lines, so, once again, it is not a curve. 
For the third example, let Sf be the graph of the “flat” function defined on page 186, 
and let S g again be the horizontal plane z = 0. The intersection is the whole unit disk 
in the (x,y)-plane. 

The problem with each example is that, even though the two surfaces meet at 
the origin, they do not cut cleanly across each other at that point. We say that the 
surfaces fail to be transverse there. Surfaces whose intersections are all transverse 
are also said to be in general position with respect to each other. We prove that 
the intersection of two regular surfaces in general position is a regular curve. To 
do this, we need to make our informal definition of transversality precise. The key 
is to note that whenever two surfaces are transverse in the informal sense, so are 
their tangent planes. But it is much easier to check transversality for the planes than 
for the surfaces: two planes passing through the same point are either different or 
identical. 

Definition 6.3 We say that two surfaces in R 3 are transverse at a regular point of 
intersection if they have different tangent planes at that point. 

For surfaces that are given as the loci of equations, the following theorem gives us 
a simple and convenient analytic criterion for transversality. 

Theorem 6.9. Suppose a = ( a,b,c ) lies on both surfaces S f : f(x,y,z) = k and S g : 
g(x,y,z) = l, and is a regular point of both f and g. Then S f and S g are transverse 
at a if and only if the matrix 


M = 


'fx{ a) /vW /z(a)\ 

M*) g>’( a ) &( a )/ 


has rank 2. 

Proof. The equations of the tangent planes of S f and S g at a are, respectively, 

/ x (a)Ax + / ; ,(a)Ay + / z (a)Az = 0, 
g x (a) Ax + g ; ,(a) Ay + g z { a) Az = 0. 

Because a is a regular point of both functions, each of these equations has at least 
one nonzero coefficient, and thus determines a well-defined plane. The two planes 
are different if and only if their coefficent vectors 
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(A(a),^(a),/ Z (a)) and (gi(a),g JI (a),&(a)) 

are not scalar multiples of each other, and this is true if and only if the matrix M has 
rank 2. □ 

We now return to the question of implicit functions. As in the past, the answer is a 
consequence of a coordinate change that makes the loci straight. Thus, instead of just 
S f : f(x,y,z) = k, we look at the whole family of nearby surfaces f(x,y,z) = k, for 
K xs k. These are nested surfaces that fill a region containing the seed point (a, b, c ). 
Likewise, we look at the family of surfaces g(x,y,z) = X (A xs /) that are nested 
around they too fill a region around the seed point. If Sf and S g are transverse 
at (a,b,c), then (as we prove in a moment), all surfaces in the first family are in 
general position with respect to all those in the second. After they are straightened 
by the coordinate change, members of the two families look like the spacers in a 
case of wine bottles; they intersect in parallel straight lines. 



x 

The figure above suggests we should consolidate the functions / and g into a 
map f: X 3 —> R 2 , 


f . I s = f(x,y,z), 

' \t=g(x : y,z). 

Then the surface Sf is the pullback of the (vertical) coordinate line s = k by f: 


Straightening 
the surfaces 


The map defined by 
/ and g 


r\k,t) = {( x,y,z ) : f(x,y,z) = k}. 

The other surfaces in the same family are the pullbacks f~ 1 ( K, t) of the other vertical 
coordinate lines. The pullbacks f~ 1 ($, A) of the horizontal coordinate lines are the 
second family of surfaces, that is, the ones nested with S g . The intersection curve 
S/CiSg is the pullback of the single point (k,l) in the (s,t)-plane. (The terms locus 
and pullback are roughly equivalent. The first is older and commonly used with 
real-valued functions; the second is used more generally with maps to an arbitrary 
target.) 
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Extensions of 
the theorem 


The map f also gives us a convenient way to indicate when S / and S g are trans¬ 
verse at the seed point a = ( a,b,c ), because the matrix M of the previous theorem 
is the derivative of f at a. Here is the theorem that “straightens” the surfaces / = k 
and g = A simultaneously. 

Theorem 6.10. Let f: A 3 —>R 2 be continuously differentiable in a neighborhood A 3 
of a point a = ( a,b,c ), let f(a,A>,c) = (A',/), and assume the derivative df a : M 3 —> 
R 2 has maximal rank. Then, on a smaller neighborhood N 3 of (a,b,c), there is a 
coordinate change h : TV 3 —»R 3 that maps each pullback f~ 1 ( K, A) to the coordinate 
line v = K, w = A in h(A 3 ). 


Proof. The 2x3 matrix df a has rank 2; therefore it has two linearly independent 
columns. By permuting the variables x, y, z, if necessary, we may assume the second 
and third columns are linearly independent. Thus, 


D(x) = det 


f y (x) f z (x)\ 

M x ) 


7^0 


when x = a. Because f is continuously differentiable, D(x) is a continuous function 
of x and therefore remains nonzero in some neighbborhood T 3 of a. Define h : f 3 
R 3 by 

{ u = x, 
v = f(x,y,z), 
w = g(x,y,z). 


Then h is continuously differentiable, and 


/ ! 0 0 \ 

dh x = f x {x) fy (x) f z (x) . 

W(x) gy(x) g z {\)J 

By construction, detdh x = D(x) f 0 for all x in y 3 . According to the inverse func¬ 
tion theorem, h (on a possibly small neighborhood (V 3 ) has a continuously differen¬ 
tiable inverse h _1 . By the definition of f, 


f(x,y,z) = K v=k, 

g(x,y,z)= A w = A, 


whenever (x,y,z) is in A 3 . □ 

Note that the proof goes beyond what the theorem states: it shows that the co¬ 
ordinate change transforms the individual surfaces f(x,y,z) = k into the coordinate 
planes v = k, and the surfaces g(x,y,z ) = A into the coordinate planes w = A of a 
second family. Moreover, because the two families of coordinate planes are obvi¬ 
ously in general position with respect to each other, the same must be true of the 
original curved families. 
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The figure also makes it clear that the map 

foh 1 : (m, v,w) —> (s,t) 

is just projection along the first component. That is, each coordinate line (u,v,w) = 
(u. k,X) parallel to the w-axis projects to the single point (k,X). Putting it another 
way: the pullback (by fo h *) of any point in the (,?,f)-plane is the line parallel to 
the w-axis that projects to that point. 

Corollary 6.11 (Implicit function theorem) Let f(x,v,z) and g(x,y,z) have con¬ 
tinuous first derivatives in some neighborhood of a point a = (a,b,c), and let 
f(a,b,c) = k, g(a,b,c) = l. If the determinant 

fy{ a) fz{ a) 
gy( a) &(») 

is nonzero, then there are unique functions y = (p(x), z = l//(x) defined on an open 
interval I containing x = a for which 

• /(x, <p(x), ( lf(x)) = k andg(x,(p(x), (//(x)) = / for allx in I. 

• cp(a) = b, y/(a) = c. 

• <p and I// are continuously differentiable on I, and 

fc(x,<P(x),yr(x)) f z (x,<p(x), y/(x)) 
s gx(x,<p(x),y/(x)) g z (x,<p(x), y/(x)) 

(p (*) =-77-77-77777-77-TTT - ’ 


^ X fy(x,<p{x),v(x)) f z (x,(p{x),y/(x)) 
gy(x,<p(x),yr(x)) g 2 (x,<p(x),yr(x)) 

Proof. Let h be the coordinate change in Theorem 6.10. Because h is the identity 
on the first coordinate, the same must be true of its inverse: 

! x = u, 
y=p(u,v,w), 
z = q(u,v,w). 

Because h 1 o h is the identity where it is defined, 

(x,y,z) =h- 1 oh (x,y,z) 

= h 1 {x,f{x,y,z),g{x,y,z)) 

= (x,p(x,f{x,y,z),g(x,y,z)),q(x,f(x,y,z),g(x,y,z))), 


fx(x,<p(x),y/(x)) f z (x,(p(x),ll/(x)) 
gx(x,<p(x),\//(x)) g z (x,<p(x) : x//(x)) 
fy(x,<p(x),w{x)) fz(x,(p{x),y/{x)) 
gy(x,fp(x),w(x)) gz(x,(p(x),y/(x)) 

fy{x,<P(x),W(x)) fx{x,<p(x),y(x)) 
gy{x,<p(x),W(x)) gx{x,<p(x),\if(x)) 
fy(x,<P{x),V(x)) f z (x,(p{x), \j/(x)) 
gy(x,<P{x),W(x)) gz(x,<p(x),yr(x)) 
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implying 


y = P{x,f{x,y,z),g{x,y,z)), z = q(x,f(x,y,z),g(x,y,z)). 

These equations reduce to 

y = p(x,k,l), z = q(x,k,l), 

when f(x,y,z) = k and g(x,y,z) = l, that is, when (x,y,z) lies in SfCiSg- Let 
p(x 1 k 1 I) = <p(x) and q(x,kj) = if/(x); as components of the coordinate change h ', 
these functions are continuously differentiable in an open neighborhood of x = a. 
By construction, 


k = f(x,y,z) = f(x,<p(x),\f/(x)), 
l=g{x,y,z) = g(x,<p(x),y/(x)), 

verifying the first condition on cp and iff. 

Because h (a,b,c) = (a,k,l), it follows that h _1 (a,^,/) = (a,b,c). In terms of 
components, 

(■ a,b,c ) = h ~ l (a,k,l) = (a,p(a,k ! /),q(a ! k ! /)) = (a, (p (a), if/(a)), 

so b = <p(a) and c = if/(a), thus verifying the second condition. 

We obtain the derivatives of (p and if/ by applying the chain rule to the equations 

k = f(x,(p(x),y(x)), l = g(x,(p (x) ,\f/(x)). 

Suppressing the arguments of the functions for clarity, we find 

0 = = lrf( x Mx)Mx)) =fx + fy<p'+fz■ V, 

ax ax 

°=-7-. = 4-g(x,(P(x), lf/{x)) = gx + gy ■ (p'+gz ■ V- 

Wa ClA 

If we write these equations in the matrix form 



we can solve them using Cramer’s rule to get 


-fx fz 


fxfz 


fy - fx 


fyfx 

~gx gz 


gx gz 


gy ~gx 


gy gx 

fy fz 


fyfz 

> r 

fyfz 


fyfz 

gy gz 


gy gz 


gy gz 


gygz 
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We can also express the hypothesis and conclusion of the implicit function theo¬ 
rem in terms of Jacobians (cf. p. 137). The hypothesis is 


d(/,g) 

d(y,z) 

and the implicit functions v = (p(x), z = 


0 , 

x=a 

y/(x) have derivatives given by 


dy _ d(f,g) j d(J,g) d: _ d(f,g) j d(f,g) 

dx d(x,z) / d(y,z) ’ dx d(y,x) / d(y,z) 

Expressed this way, the derivatives are strikingly similar in form to the derivative 
of the function y = cp(x) that is implicitly defined by the single equation f(x,y) = k 
(Theorem 6.1, p. 189): 

dy = _d£ j dj_ 
dx dx / dy 


Let us return to the question we have already addressed several times before 
(Chapters 4.2, 4.3, 5.2, 5.3, especially pp. 119-121, 128-128, 163-165, 175-176): 
to what extent—and in what way—does a map look like its linear approximation 
near a given point? Theorem 6.10 deals with a map f: 7/ 3 —> R 2 . Under that as¬ 
sumption that df a has maximal rank (namely 2), it shows that a suitable coordinate 
change will make f look like the linear projection 77 : R 1+2 —> R 2 : (x,y,z) i—> (y,z). 
But according to Theorem 2.19 (p. 50), a coordinate change will likewise make df a 
into the same projection 77. Coordinate changes thus make f look like df a near a. 
Maximal rank is essential. To see this, consider the map f: R 3 —> R 2 : (x,y,z) —> 

(•M): 


We have 

,, _ (l 0 0 \ 

dx yO 3(y — z) 2 -3 (y-z) 2 )' 

so the rank of df x is only 1, not 2, at all points in the plane z=y (thus including the 
origin). Near the origin, f is geometrically different from dfo; see Exercise 6.16. 


Jacobians 


When does f 
“look like” df a ? 


Maximal rank 
is essential 


6.3 The general case 

In the general case, p equations constrain the values of k+p variables. We expect to 
find that p of the variables are implicitly determined by the remaining k. Under what 
conditions can we guarantee that happens, and which variables will be functions of 
which? Here is the same question, in geometric terms: given a map from (an open 
set in) R i+p to R p , what does the pullback of a point look like? As we have already 








206 


6 Implicit Functions 


Partial derivatives 


Notation: 6>i = d x 


seen in the low-dimension cases, an answer to the first will follow readily from an 
answer to the second. 

Because the source of the map is split into two factors, with k real variables in 
one and p in the other, it is useful to split the derivative of the map into the two 
parts—its “partial ” derivatives —that act separately on these two factors. To define 
them, we assume f: X k+P —> R" : (x,y) —> z is differentiable and x = (x\,... ,xf), 

y = (yu---,y P )- if 


! Z1 =fi(xi,---,x k ,y u ...,y p ), 
z n =fn(xi,...,x k ,yi,...,y p ), 


then the derivative of f is given by the n x (k+p) matrix 

(fu ••• f\k f\,k+\ ■■ f\,k+[f 


df (*,y) = 


V/«l ’ ' ’ fnk fn.k\ 1 ''' fn.k\ pJ 


where 


(df 

dx/ 


fj={ 


(x,y) if 7 = 1, ..-,k, 


d f 

y) if j = k+q and q= 

oy q 

and i = 1,. .. ,n. 

Definition 6.4 The partial derivatives off : X k ' p — > R" are the linear maps d\f lx y ) = 
^xf(x.y) : an d ^2f(x,y) = <5yf( x ,y) : R^ — > R” given by the matrices 


<? lf (x,y) = 

(fu • 

• f\k\ 

II 

«4-l 

fN 

OS 

(/i,£+i • 

fl,k+p\ 


Ui • 

■ fnk/ 


\fn,k+ 1 

fn,k+p ) 


If the derivative of f is continuous, then so are its partial derivatives. The notation 
“di ” signifies the partial derivative with respect to the first factor, and the alternate 
notation “ d x ” signifies the partial derivative with respect to the x factor. As we have 
done for functions of two real variables (e.g., as with f\{x,y) = f x {x,y)), we use 
these notations interchangeably. 

Theorem 6.12. Suppose the map f: X k+P —> R^ 1 is continuously differentiable , and 
the derivative df( a b j : M. k+f> —> R p has maximal rank p. Then there is a coordinate 
change h : (x,y) —> (u, v) defined in a neighborhood N k+P of (a, b) that transforms 
f into the projection II: (u, v) i—> v; that is, fo h 1 = J7. 

Proof. We know p columns of df a are linearly independent. By permuting the vari¬ 
ables, if necessary, we may assume that the final p columns are, so the partial deriva- 
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tive <3yf( a .b) : R p —> R p is invertible. Now use the component functions of f to define 


u 1 =X\, 


Uk X k . 

Vi =fi{x\,...,x k ,y\,...,y p ), 


schematically, 


h : 


u = x, 
v = f(x,y). 


v p =fp(xi,...,x k ,yi,...,y p ); 


Then h is continuously differentiable on X k+P (because f is), and 


( 1 ° \ 

dh (*,y) = U , , > detdli( x y) = detd y f (x . y) , 

\0'x 1 ( x ,y) dyt(x,y)J 


implying that detdh (a b ) f 0. By the inverse function theorem (Theorem 5.2, p. 169), 
h is invertible on some smaller neighborhood N k+P of (a,b). 

To show that f o h 1 =17, first write h 1 schematically as 


x = u, 
y = g(u,v), 


for a suitable map g : h(N* +p ) —> R p . By the definition of an inverse, 
(u,v) = h o h —1 (u, v) = h(u,g(u,v)) = (u,f(u,g(u,v))) 


implying 

V = f(u,g(u,v)) = f(h _1 (u,v)), 

as desired. More simply, we know fo h _1 (u,v) = v because f is the second compo¬ 
nent of h, and ho h _1 (u,v) = (u,v). □ 

Corollary 6.13 (Implicit function theorem) Suppose f : X k+P —> IK/' : (x,y) —* z 
is continuously differentable and f(a, b) = k. If the partial derivative map <9yf( a .b) : 
R7 1 —> R p is invertible, then there is a unique map y = (p(x) defined on a neighbor¬ 
hood N k ofx = a in R* for which 

• f(x,<p(x)) = kfor all x in N k . 

• <p(a) = b. 

• <p is continuously differentiable on N k , and 

d<Px = — (<5yf(x,<p(x))) ° ^xf(x,«p(x)) : 

Proof Let h be the coordinate change defined on the neighborhood N k+P of (a, b) 
in R i+P , as provided by Theorem 6.12; let h 1 be its inverse. We wrote 
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IR /; 


f(x,y) = k 



x 



k 


pk+p 


(u,k) 



u 


Summary 


Submersions 


h ^U.v) = (u,g(u,v)) 

for a suitably defined continuously differentiable map g on P k+P = h(N k+p ), and 
we saw that 

v = f(u,g(u,v)) 

for every (u, v) in P k+P . In particular, k = f(u, g(u, k)). 

Now define N k by the condition 

u is in N k <*=>■ (u, k) is in P k+P , 

and set <p(x) = g(x,k). Then (p is defined on all of N k , and the equation k = 
f(u,g(u,k)) translates into 

f(x,(p(x)) = k for every x in N k . 

This verifies the first condition. Also, because h(a,b) = (a,f(a,b)) = (a,k), 

(a,b) = h _1 (a,k) = (a,g(a,k)) = (a,<p(a)), 

it follows that <p( a) = b, verifying the second condition. 

The third condition follows from the chain rule applied to the equation k = 
f(x,(p(x)). To carry out the differentiation, it will be helpful to define the map 
0 : N k —s- K. i+p : x i—> (x,(p(x)). Then k = fo <&(x), so 

O = df<K( x ) O dd>x = (<?xf(x,<p(x)) ^yf(x.«p(x))) 

pxk px(k+p) (k+p)xk x 

= £^xf(x,ip(x)) ° I + ^yf(x.<p(x)) ° dfl> x - 

pxk kxk P X P pxk 

Using the invertibility of <3yf( X ,<p(x))> we can solve for d<p x to get 

d<Px = — (<?yf(x,(p(x))) ^xf(x,<p(x))i 

verifying the third condition. □ 

This final version of the implicit function theorem echoes the first one (Theo¬ 
rem 6.1, p. 189). In broad outline, it tells us that the locus f(x,y) = k is the graph 
of a map y = <p(x) for which dq> x = (^yf(x,<p(x))) _1 ° <? x f(x.p(x))> assuming only that 
^yf(x.y) is invertible at a seed point (x,y) = (a,b), where f(a,b) = k. The key to the 
proof is that f is equivalent to a projection near (a,b); in turn, this follows (Theo¬ 
rem 2.19) from the fact that df( a b) is onto. 

A map whose derivative is onto is called a submersion. Ultimately, the proof of 
the implicit function theorem can be traced back to the simple fact that f is a sub¬ 
mersion. Submersions have useful behavior with important consequences (beyond 
the implicit function theorem) that we now pause to explore. 







6.3 The general case 


209 


Definition 6.5 A continuously differentiable map f: X" —> & is a submersion at c 
if df c : R" —> R p is onto. 

Theorem 6.14. A map f: X" —> W is a submersion at c if and only if there is a 
coordinate change h : N” —> R" defined on a neighborhoodN 71 of c for which f o h 1 
is a projection. 

Proof. Notice that the “only if’ part of the theorem is just a restatement of Theo¬ 
rem 6.12. To prove the converse (the “if’ part), let fo h _1 = 17, a projection. Then 

dfx = dn h(x) o dh x = n o dh x , 

so f is continuously differentiable on N n and df x is onto for every x in /V" because 
II and dh x are both onto. □ 

Thus, submersions are precisely the maps that are locally equivalent to projec¬ 
tions. Moreover, because f o h 1 = IT = df x o dh x 1 , the local coordinate change 
h 1 o dh x transforms f into its linear approximation df x . This is the generalization 
of Corollary 6.7, page 197. The next result is a generalization of Corollary 6.8. The 
result following that is a consequence of the fact that a submersion is equivalent to 
a local projection. 

Corollary 6.15 Iff : X n —> R p i s a submersion at c, then there are curvilinear co¬ 
ordinates defined near c in which p of the n coordinate functions are the component 
functions off. 

Proof. This follows immediately from the definition of the coordinate change h in 
the proof of Theorem 6.12. □ 

Corollary 6.16 Iff: X n -> R P is a submersion at c, then f maps X n onto a neigh¬ 
borhood of f(c). □ 

Submersions give us a valuable way to describe and deal with curved surfaces. 
To see how this happens, consider first the locus St : f(x,y) = k defined by the 
submersion f. In general, S f is a curved subset of R* +p , but of a special kind. 
For suppose c = (a, b) is a seed point of f; that is, f(a, b) = k. Then the proof of 
the implicit function theorem provides a coordinate change h : (x,y) —> (u,v) that 
“straightens” St locally and makes it a flat A'-dimensional plane near c. In effect, 
{u \,..., Uk, vi ,..., v p ) provides new curvilinear coordinates in (x,y)-space in which 
equations of the form v\ = kt ,..., v p = K p specify St and the variables (mi, ..., uf) 
provide a system of curvilinear coordinates on St itself. The k coordinates u\, ..., 
uic, imply St is ^-dimensional. We now use this characterization of 2>f as the basis 
of the definition of an embedded surface patch. 

Definition 6.6 A set S in R" is an embedded surface patch of dimension k at the 
point c if there are coordinates (u \,..., Uk, iq,..., v„_^) in a window W n centered at 
c so that S is given by the conditions v\ = K\, ..., v n _t = there. The variables 
u = (mi ,..., Uf.) provide coordinates on S in W n . 


f “looks like” df 


Embedded 
surface patches 
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Surface patches of 
dimension 0 or n 


Surface patches 
and pullbacks 



We may abbreviate this term to surface patch, embedding or just patch. We can 
extend the definition to allow k= 0 and k = n. An embedded surface patch of di¬ 
mension 0 at c is just the point c itself; it is specified by a full set of n equalities 
vi = Ki,..., v n = K„ . An embedded surface patch of dimension n at c is just an open 
set containing c. It is specified by an empty set of equalities. 

Theorem 6.17. Suppose f: X" —» Wi !> is a submersion at a point c in X n and f(c) = k. 
Then the pullback f 1 (k) is an embedded surface patch of dimension n — p at the 
point c. □ 

This is just Theorem 6.12 restated using surface patches. The following theorem is 
its converse; the two taken together imply surface patches are precisely the pullbacks 
of points by submersions. 

Theorem 6.18. Suppose 5 is an embedded surface patch of dimension k at a point 
c in R". Then there is a submersion g : X n —> R” _i at c for which S = g _1 (g(c)). 

Proof. By hypothesis, there is a window X" centered at c and a coordinate change 
h : X n —> : x —> (u, v) in terms of which S is given by the equations vi = 

v\ (c), ..., v „_4 = v„_i(c), where the constants v ; (c) are the v-coordinates of the 
point c. Let us write the components of h as 

mi =h\{x\,...,x„), 

u k = h] i {x\,... ,X n ), 

V\=g\{x\,...,x n ), 

, v n~k = Sn—k(x\,... ,Xn), 

and let g : X n — > R n ~ k be defined by 


vi =g 1 


g: 


V«— k — Sn—k{x 1 , • • • ,Xn). 
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Then g is continuously differentiable because h is. Moreover, for every x in W n , the 
matrix dg x has maximal rank n — k because it makes up the last n k rows of the 
invertible matrix dh x . In particular, dg c is onto, and 

S = g“'(vi (c),..., v„_*(c)) = g _1 (g(c)). □ 

Although the point g(c) has dimension 0, its pullback g 1 (g(c)) has dimension k. 
The pullback does not preserve dimension. However, the difference in the dimen¬ 
sions of the point and its containing space, namely n — k, is the same as the differ¬ 
ence in the dimensions of the pullback and its containing space. This suggests that, 
in discussing pullbacks, we focus on this difference, called the codimension. We 
have already done this for vector spaces and pullbacks of onto linear maps. Accord¬ 
ing to Definition 2.7 (p. 51), the codimension of a vector subspace IT in a vector 
space ‘T’ is 

codim W = dim “V — dim W. 

By Corollary 2.21, page 51, any onto linear map L : M" —> W preserves codimension 
under pullback; that is, if If is a subspace of codimension m in the target K7\ then 
the subspace L~ l (IT) has the same codimension m in the source M". 

Definition 6.7 We say an embedded surface patch S of dimension k in R" has codi¬ 
mension m = n — k 

Note that the codimension of a surface patch is the number of equations (including 
m = 0 for an open set) that define the patch in Definition 6.6. Furthermore, the 
codimension of the surface patch in Theorem 6.18 equals the dimension of the target 
of the map g that defines the patch. 

Theorem 6.19. Suppose f: X n —► W ! is a submersion at c, and S is an embedded 
surface patch of codimension m at the point k = f(c) in RC Then f -1 ^) is an 
embedded surface patch of codimension m at c in X n . 



Proof. According to Theorem 6.18, there is a submersion g : W p —> R m at k for 
which S = g -1 (g(k)). Because 

r ! (5) = r 1 (g^ 1 ( g (k)) = (g° f)~‘ (g(k)) = (g°f) — '((go f)(c)), 


Dimension and 
codimension 


Submersions 
preserve embeddings 
under pullback 
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Immersions and 
injections 


it is sufficient to show that go f: X n —> R m is a submersion at c. But by the chain 
rule, 

d(g°f) c = dg k °df c , 

and the composite is onto because the individual maps are. □ 

Submersions handle pullbacks properly; however, they do not behave well with 
push-forwards. That is, if S is a surface patch at c, and f is a submersion at c, the 
image f(5) is not, in general, a surface patch; see the exercises. We have faced this 
dilemma already with linear maps in Chapter 2.3, and we resolved it there (Corol¬ 
lary 2.28, p. 56) by switching from onto to 1-1 linear maps. To handle push-forwards 
properly, we use an immersion, that is, a map whose derivative is 1-1. 

Definition 6.8 A continuously differentiable map f: X" — W is an immersion at c 
if df c : R” —> RP is 1-1 (implying n < p). 

Recall (Theorem 2.27, p. 56) that every 1-1 linear map can be transformed in a 
simple form, called an injection J: R" —> R" +9 , that is analogous to a projection 77 : 
W +k —> RP. The analogy is easily seen by looking at their matrix representatives: 



Theorem 6.20. A map f: X n —► R” +9 is an immersion at c if and only if there is a 
coordinate change h : A rM+9 —> R” +9 defined on a neighborhood _/V” +9 of f(c) for 
which f = h o f is an injection. 

Proof. To prove the “only if’ part, we assume f is an immersion at c. Hence the (n + 
q) x n-matrix df c : R” —> R" +9 is 1-1 and consequently has n linearly independent 
rows. By rearranging the rows (and the corresponding target variables), if necessary, 
we assume that the first n rows are linearly independent. Write the target variables 
as (y,z), where y = (y\,...,y n ), z = (- 1 , ■ ■ ■ and write f in terms of (vector) 
components as 

f . jy = f iM, 

|z = f 2 (x). 

In particular, fi : X' 1 —> R" is continuously differentiable and the condition on df c 
makes d(fi) c : R" —> R" invertible. By the inverse function theorem (Theorem 5.2, 
p. 169) fj is invertible on some neighborhood Wj 1 offi(c) inR". Let N n+q =N " xR 9 , 
and define h : 7V"+ 9 —> R ,! + 9 : (y,z) —> (y,z) by the vector components 


y = *r‘(y) 

z = -f 2 (fr 1 (y)) + 


Because its components are continuously differentiable on N’ t+q , h is a valid coor¬ 
dinate change, and it transforms f into 
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f(x) = hof(x) = h(f!(x),f 2 (x)) 

= (fT 1 (fi(x)),-f 2 (fr 1 (fi(x))) + f 2 (x)) 

= (x,-f 2 (x) + f 2 (x)) 

= (x,0). 

This is an injection. 

To prove the converse (the “if’ part), assume J = h o f is an injection. Rearrange 
the variables, if necessary, so that J(x) = (x,0) in R” + L Then 

f W =h _1 (x,0), 

implying that f is continuously differentiable wherever it is defined. We must show 
it is defined on some open neighborhood of c. 

Because h(A^ +<? ) is an open set containing h(f(c)) (by the inverse function theo¬ 
rem), it contains an open “rectangle” X n x Y q centered at h(f(c)) = (c, 0). Therefore, 
for any x in X n , h 1 (x, 0) = f(x) is defined. Finally, 

df x = dh /(x) ° dJ\ ~ dh^ljj) oj 

so df x is 1-1 because injections and invertible maps are 1-1. □ 

Thus, immersions are precisely the maps that are locally equivalent to injections. 
Moreover, because ho f = J = dh llxj o df x , we see that coordinate changes locally 
transform f into its linear approximation, so f “looks like” df. The following corol¬ 
lary is an immediate consequence of the fact that an immersion is a local injection. 

Corollary 6.21 Iff: X n -> R n+q is an immersion at c, then f is 1—1 on a neighbor¬ 
hood of c. □ 

The next theorem, which says that the image of a surface patch under an immer¬ 
sion is still a surface patch of the same dimension, has a more complicated proof 
than its analogue for submersions because surface patches are naturally determined 
by submersions (under pullbacks). 

Theorem 6.22. Suppose f: X" —> M" 1 q is an immersion at c, and S is an embedded 
surface patch of dimension k at c in X'\ Then the image f(5) is an embedded surface 
patch of the same dimension k at f(c) in R” + L 

Proof. By the definition of an embedded surface patch (Definition 6.6), there is a 
coordinate change g: X" —» R" : x^ —> {y(icy7-(n-k)) that “straightens” S near c. Let 
us suppose that g(J>) is given by equations z\ = kt , ..., = K n -k (i.e., z = K) in 

the new coordinates. To prove that f(5) is an embedded surface patch of dimension 
k at f(c), it is sufficient to find new coordinates (u^j, V(n-k ); w ( ? )) ' n a neighborhood 
N n+q of f(c) in which f(5) is specified by the n — k+ q equations v = K, w = 0. 


f “looks like” df 
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The figure shows how to build the new coordinates. First, use the hypothesis that 
f is an immersion at c to get (from Theorem 6.20) a coordinate change h : N n 1 q —> 
R” xR ? : r( M+9 ) —> (s(„), ) on a neighboorhoodiV" +9 off(c) that transforms f into 

an injection to the first n coordinates: 


(s,t) = (ho f)(x) = (x,0); (hof)(5) = (5,0). 


Next, use the coordinate change g already introduced to define 

g x 1 : R" x R 9 -> R* x R n ~ k x R 9 : (s,t) -»■ (u (jt) ,v ( „_* ) ,w ((?) ) 

on a neighborhood of (c, 0) in IT. Then the composite coordinate change (g x /) o h 
“straightens” f(5) near f(c): f(5) is given by the n — k+ q equations v = K, w = 0. 

□ 

Corollary 6.23 Iff : X n —» R" 1 q is an immersion at a point c, then f(X") is an 
embedded surface patch of dimension n at the point f(c) in R” +9 . 

Proof. Because X n is an open set in R”, we can view it as a surface patch of dimen¬ 
sion n (i.e., of codimension 0) at c in R". It follows from the theorem that its image 
f(X”) is a surface patch of dimension n at f(c) in R" + T □ 

vWe can think of the previous corollary as the basis for our study of curves 
(in Chapter 1.2) and surfaces in space (Chapter 4.3). To see the connection, let f: 
X n —> R p be an arbitrary continuously differentiable map (i.e., not necessarily an 
immersion); it is given by p component functions of n real variables: 

fvi =fi{xi,...,x n ), 

f: I : 


Vp=fp{x\,...,x n ). 
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If n = 1 then f defines the parametrization of a curve in R p . Our earlier definition 
(Definition 1.2, p. 7) is more restrictive: f must be smooth (i.e., have continuous 
derivatives of all orders) and df, = f (t ) must have a nonzero derivative at each inte¬ 
rior point t. Smoothness is mainly just a technical convenience, but the requirement 
that dfi / 0 means f must be an immersion at each interior point. Consequently, a 
curve as given by Definition 1.2 is actually embedded at each point: there are curvi¬ 
linear coordinates (x\,... ,x p ) in R p in which the curve is specified by the conditions 
X 2 = • ■ ■ x n = 0, and x\ serves as a parameter along the curve. 

If n = 2 and p = 3, then f parametrizes an ordinary surface in space. Such a map 
is an immersion at a point c only if the derivative df c has rank 2, its maximal rank. 
In the examples in Chapter 4.3, f was indeed an immersion at most points, so the 
image surface was embedded there. That is, (by Corollary 6.23) we could introduce 
coordinates (p,q,r) in a neighborhood of such a point so that the surface was given 
locally by the equation r = 0 and (p,q) could serve as coordinates on the surface 
near the point. 

The notable example of a nonimmersion is the crosscap. The crosscap map 



fails to be an immersion at the origin (cf. pp. 127-128). 


Exercises 

6.1. a. Determine the location of the three saddle points and one relative maxi¬ 

mum of f(x,y ) = (3y 2 — x 2 )(x — 1). 

b. Plot together the graphs of z = f(x,y) and z = 0 on the square for which 
—0.3 < x < 1.3, —0.8 < y < 0.8. Determine the locus f(x,y) = 0 from 
the intersection of the two graphs, and note the location of the four critical 
points off in relation to this intersection. 

6.2. Solve the equation e xv = 1 fory near the point (2,0). What is dy/dx at that 
point? Sketch the locus e xy = 1 . 

6.3. Solve the equation e xy = e for y near the point (2,1/2). What is dy/dx at that 
point? Sketch the locus = e. 

6.4. Solve the equation y 2 — 2ycosx — sin 2 x = 0 for y, and determine dy/dx. 
Sketch the locusy 2 — 2ycosx — sin 2 x = 0. 

6.5. Solve the equation;; 2 — 2ycosx +sin 2 x = 0 fory; for which values ofx isy 
undefined (as a function ofx)? Determine dy/dx; where is dy/dx = °°? Sketch 
the locus y 2 — 2ycosx + sin 2 x = 0. 
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6.6. a. Solve the equation x 2 + 3xy+4y 2 = 14 forx in terms of v. Determine dx/dy 

when y = 1. 

b. Sketch the locus f(x,y) =x 2 + 3 xy + 4v 2 = 14. 

c. Determine the implicit function y = <p(jc) for which f(x,(p(x)) = 14 and 
<p(2) = 1. Determine <p'(2) and relate it to the value of dx/dy that you 
found in part (a). 

6.7. Determine the linearization of the locus f(x,y ) = 0 at the given point (a,b). 
Indicate whether the linearization is the tangent line to the locus at that point. 

a - f{x,y) = y 2 +x 2 (x + 1); (a,b) = (-1,0). 
b. f{x,y) =y 2 +x 2 (x+ 1); (a,b) = (0,0). 
c- f(x,y) = (3 y 2 -x 2 )(x- 1); (a,b) = (1,0). 

d. f(x,y) = (3 y 2 -x 2 )(x- 1); (a,b) = (1,1/V3). 

e. f{x,y) = (3 y 2 -x 2 )(x- 1); (a,b) = (0,0). 
f- f{x,y) =x 3 +y 3 ; (a,b) = (0,0). 

g- f(x,y) =x 3 +/; (a,b) = (1,-1). 

6.8. a. Sketch representative level curves of f(x,y ) = xy 2 in the window W for 

which I < x < 2, 1 <y <2. Verify that every point of IF is a regular point 
off. 

b. Obtain the map h : W —> R 2 that straightens the level curves of /, using the 
construction in the proof of Theorem 6.2. Then, using a suitable parame- 
trization of a level curve, verify that it does indeed have a horizontal image 
under h. 

c. Show that the image of W is the set 

1 <u <2. u < v < 4 m. 

Sketch level curves of / that meet either the top or the bottom of W, and 
then sketch their images under h. Where do those images meet the bound¬ 
ary of h (IF)? 

d. Obtain the formula for the inverse of h on h(JF). 

6.9. a. Sketch representative level curves of f{x,y) = x 2 +y 2 in window W for 

which l<x<2, l<y<2. Verify that every point of IF is a regular point 
off. 

b. Obtain the map h : W —> R 2 that straightens the level curves of /, using the 
construction in the proof of Theorem 6.2. Then, using a suitable parame- 
trization of a level curve, verify that it does indeed have a horizontal image 
under h. 

c. Show that the image of IF is the set 

0 < u < 2, u 2 + 1 < v < if + 4, 
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and sketch the image, including imges of the level curves of /. 

6.10. a. Let f{x,y) =x 2 +y 2 and let Z be the window 0 < x < 2, — 1 <y < 1. Verify 
that every point of Z except the origin is a regular point of /. Sketch the 
level curves off in Z. Note that f y = 0 at the center of Z. 

b. Show that the map h : Z —> M 2 , 


h: 


u=y, 
v = f{x,y), 


is a valid coordinate change near (1,0); that is, show h is continuously 
differentiable with a continuously differentiable inverse in a neighborhood 
of (1,0). 

c. Show that h “straightens out” the level curves of /. Describe the salient 
geometric features of the action of h; in particular, indicate what happens 
to a horizontal line in Z. 


6.11. Consider the function 


f(u,V,w) 


1 +w 1 — u 1—V 
1 — w 1 +u 1 + v’ 


for — 1 < u, v, w < 1 . Think of u, v, and w as speeds expressed as fractions of 
the speed of light. Note that /(0,0,0) = 1. This exercise studies the implicit 
function w=<p(u,v) defined near (w,v) = (0,0) by the equation f(u,v,w) = 1. 

a. Use / to compute the partial derivatives d(p/du and dtp/dv, and deduce 
that <p(w,v) = u + v+0( 2). 

b. Show that 

, . u + v 

W = (p{u,v) = - - = M© V. 

1 + MV 

This defines a binary operation called the law of addition of velocities in spe¬ 
cial relativity. That is, if observer A is moving away from observer B with 
velocity u (as a fraction of the speed of light), and B is moving away from C 
along the same straight line with velocity v, then A will be moving away from 
C with velocity w = u® v. According to part (a), if u and v are small, then 
u © v « u + v, but not otherwise. 


c. Show that u® vis defined for all \u\ < 1 and |v| < 1, and that |w® v| < 1. 

d. Show that lim„^i m®v= 1, allowing us to extend tp so that 1 ® v = 1 (and, 
by symmetry, u® 1 = 1). 

Thus, if A now represents a photon (a light particle), it moves away from 
B and from C at the same speed, even though B is moving in relation to C. 
Special relativity is built on the premise that the speed of light is an invariant 
for all observers moving uniformly in relation to each other. 
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6.12. Prove Theorem 6.3 and Corollary 6.4. (Suggestion: Adapt the proofs of their 
2-dimensional analogues.) 

6.13. Show that any set of points in the (x.y)-planc that can occur as the zero- 
locus of a function of x andy can occur as the zero-locus of a suitably chosen 
function of x,y, andz in the (x,y,0)-plane. (Suggestion: consider /(x,y,z) = 
[g(x,y)] 2 +Z 2 .) 

6.14. Sketch the intersection of the surfaces Sr'.x 2 + y 3 — z = 0, S g : z = 0. Is 
the intersection the graph of a continuously differentiable function y = (p (x) 
within the plane z = 0? Explain. Address the same question using a function 
of the form x = yf(y). 

6.15. Show that the surfaces defined by (x+y) 3 — z = 0 and z = 0 intersect in a 
straight line. Verify that the surfaces are not transverse at any intersection 
point. 

6.16. Let f: K 3 —> R 2 be defined by (s,t) = f (x,y,z) = (x, (y — z) 3 ). 

a. Determine the image of dfo and show thereby that it is 1-dimensional. 

b. Show that f maps any window centered at x = (0,0,0) onto a small win¬ 
dow centered at s = (0,0). In particular, show that, for any a, b near 0, 
the equation f(x,y,z) = (a,b) has a one-parameter family (i.e., a curve) of 
solutions. Determine that curve. 

c. Conclude that f does not “look like” dfo near the origin. 

6.17. In (x,y,z)-space, x 2 +y 2 = r 2 is a cylinder of radius r > 0 whose axis is the 
z-axis, andx 2 +z 2 = 1 is a cylinder of radius 1 whose axis is the y-axis. 

a. Sketch the intersection of the two cylinders when r 2 = 3/4. Now let r be 
arbitrary, assuming only that r < 1. Find implicit functions y = <p(x) and 
z = y/(x) determined by the equations x 2 +y 2 = r 2 and x 2 +z 2 = 1. Do 
this for each of the four seed points (0,±r,±l). What are the domains of 
definition of (p and (//? 

b. Sketch the intersection of the two cylinders when r 2 = 4. Now let r be 
arbitrary, assuming only that r > 1. Find implicit functions y = <p(x) and 
z = 1 jf{x) determined by the equations x 2 +y 2 = r 2 and x 2 +z 2 = 1 and the 
four seed points (0, ±r, ±1). Now what are the domains of definition of (p 
and i/r? 

c. The implicit functions take simple forms when r = 1. What are those 
forms, and what is the shape of the intersection? 


Chapter 7 

Critical Points 


Abstract At a regular point, the linear terms of a function determine its local be¬ 
havior, and there is a local coordinate change that transforms the function into one 
of the new coordinates. At a critical point, the linear terms vanish, but there is still 
an analogous result for the quadratic terms, called Morse’s lemma. However, the 
quadratic terms may not determine the local behavior, but when they do (the critical 
point is then said to be nondegenerate ), Morse’s lemma provides a local coordinate 
change that transforms the function into a sum of positive and negative squares of 
the new coordinates. In this chapter we analyze Morse’s lemma and use it to char¬ 
acterize critical points. 


7.1 Functions of one variable 

Let us see how a coordinate change can transform y = /(x) into a pure square near 
a critical point x = a. As happens so often in local analysis, the key tool is Taylor’s 
theorem. We need the first-order expansion; it helps us to write the remainder using 
the explicit integral formula that is given in the original formulation of the theorem 
(Theorem 3.9, p. 79). In fact, because f (a) = 0, the only nonconstant term in the 
expansion is the remainder: 


f(a + Ax) = f(a) + h(Ax) (Ax) 2 . 

The variable coefficient h (Ax) in the remainder term is the integral 



Because we need h (Ax) to be continuously differentiable for all Ax near 0, we re¬ 
quire / to have a continuous third derivative. Then 
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h! ( Ax) = [ f"(a + tAx)t(\—t)dt. 
Jo 


Substituting Ax = 0 gives 

KQ)=f( a ) [ (1 -t)dt = ^-^-, h'(0) =/"(«) [ t{\-t)dt= ^ < f\ 

Jo 2 Jo o 

If a coordinate change Ax —> At/ is to transform Av = f(a + Ax) —f(a) into a pure 
square, Ay = ±(A u) 2 , then A u must be 

Am = p( Ax) = Ax \J | h (Ax) |. 


It remains to see whether p is a valid coordinate change. Before we do this, note 
how the “±” comes into play in the formula for Ay. Because 


(At/) 2 = (Ax) 2 |/z(Ax)| 


(Ax) 2 A(Ax) = Ay if h(Ax) > 0, 

— (Ax) 2 //(Ax) = —Ay if //(Ax) < 0, 


it follows that 


j +(Am) 2 if h( Ax) > 0, 

^ 1^ —(Am) 2 if//(Ax)<0. 

Now consider the function p. Formal differentiation gives 

p (Ax) = W\h(Ax)\ ± — ^ = Ax, 
^ Vl 2^/\h{Ax)\ 


implying p has a continuous derivative on any interval where h (Ax) ^ 0. (The sign 
in the formula for p’ is chosen to be the sign of h( Ax).) Moreover, 


/(0) = vT*( 0 )T= V \f"(a) 1/2. 


Thus, if f" (a) ^ 0, the inverse function theorem implies that p is a valid coordinate 
change on an open interval containing Ax = 0, and we then have 


y = 


f(a) + (Am) 2 
f(a) — (Am) 2 


>f/"(«) > 0, 

if/»<0. 


If /" (a) = 0, our argument fails to obtain the coordinate change p , but it is natural 
to ask if a better argument would repair the problem. The answer is no; that is, if 
/"(a) = 0, there may be no new coordinate Am for whichy = /(a) ± (A//) 2 . We can 
see this geometrically, because the equation y = f{a ) + (Am) 2 necessarily implies 
/ has a minimum at a, and y = f(a) — (Am) 2 implies / has a maximum there. But 
the function f(x) = x 3 has a critical point at the origin for which /"(0) = 0, and the 
origin is neither a minimum nor a maximum. 
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For functions of a single variable, the preceding discussion establishes two re¬ 
sults: Morse’s lemma and the more familiar second derivative test. 


Theorem 7.1 (Morse’s lemma). Supposey = f(x) has a continuous third derivative 
on an open interval that includes a critical point x = a where f" (a) / 0. Then in a 
sufficiently small window centered atx = a there is a coordinate change Am = p(Ax) 
for which 

Ay = ±(A u) 2 , 

where the sign of (Am) 2 is chosen to be the sign of f (a). □ 


Theorem 7.2 (Second derivative test). Suppose y = f(x) has a continuous third 
derivative on an open interval containing a critical point x = a; then the critical 
point is 

• A local minimum of f if f" [a) > 0 

• A local maximum of f if f (a) < 0 

If f"(a) = 0, the test is inconclusive. □ 


Thus, a function “looks like” its quadratic approximation near a point where the 
linear approximation breaks down (i.e., at a critical point), assuming the quadratic 
approximation does not itself break down. We already have names to distinguish be¬ 
tween points where the linear approximation to a function breaks down and where 
it does not ( critical and regular points, respectively). Morse’s lemma suggests we 
make a similar distinction for critical points. Thus we say a critical point is degen¬ 
erate if the quadratic approximation “breaks down,” or “degenerates,” in the sense 
that it fails to determine the local behavior of the function. Otherwise, we say the 
critical point is nondegenerate. For a function of one variable, the situation is clear- 
cut: a critical point is degenerate if and only if the second derivative vanishes. For 
functions of more than one variable, there are several second partial derivatives; as 
we show in the following sections, a critical point may be degenerate even though 
all of those second derivatives are nonzero. The relation between degeneracy and 
the second derivatives is more subtle. 

To see how Morse’s lemma works, let us apply it to /(x) =x —x 3 /3 at the critical 
point x = 1. Because/"(l) = —2, the point is a local maximum. In terms of window 
coordinates (Ax, Ay) centered at (x,y) = (1,2/3), the formula for /becomes 


A T = /( 1 + Ax )-/( 1 ) 

= 1+Ax- (l+3Ax + 3(Ax) 2 +(Ax) 3 )/3 —(1— 1/3) 
= —(Ax) 2 — (Ax) 3 /3 = (Ax) 2 ( — 1 — Ax/3). 


Thus h( Ax) = — 1 — Ax/3, and so Ay = — (Am) 2 when we set 


Degeneracy of 
a critical point 



Am = p( Ax) = A x\J 1 + Ax/3. 
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A coordinate change 
can reverse concavity 


How derivatives 
depend on coordinates 


The second derivative 
at critical and 
noncritical points 



Aj = -(Am) 2 


The coordinate change p maps the nonuniform grid on the left, above, to the uni¬ 
form grid on the right, transforming the original cubic curve into a simple parabola. 
Notice that p pushes points to the left of the origin closer together horizontally; this 
happens because 0 <p' (Ax) < 1 when —2 < Ax < 0. To the right of the origin, where 
1 < p'{ Ax), points are pushed apart. Finally, because p'( 0) = 1, the two grids have 
essentially the same spacing near Ax = 0. That implies the cubic and the parabola 
“share ink” near the origin, as the gray copy of the parabola on the left makes clear. 

The figure also shows that the coordinate change reverses the concavity of part 
of the graph. For example, at the point A the original cubic is concave up, but at 
its image A' the parabola is concave down. We associate concavity with the sign 
of the second derivative, so a coordinate change can reverse the sign of the second 
derivative. If this were to happen at a critical point (A is not a critical point), the 
second derivative test would be completely meaningless. 

Let us see how a coordinate change can alter the sign of the second derivative at 
a noncritical point. Assume y = /(x) is a differentiable function with /(0) = 0, and 
let x = h(u) be a coordinate change with /;(0) = 0. Then 

y = f(x ) =f{h(u)) =g(u) 

defines the transformed function g, and we compare g^O) with /"(0). We have 

d{u)=f{x)-h'{u) and g"(«) =/'(x) • (h'{u )) 2 +/(x) • h"{u), 

and our assumptions about the values of h and / at the origin give us 

g'(0)=/(0) •/*'(()) and g ,, (0)=/ / (0)-(A / (0)) 2 +/(0)-A"(0). 

Because A is a coordinate change near the origin, h'( 0) ^ 0 and the first equation 
implies 

g'(0)=0 ^ / (o) = o. 

In other words, the origin is a critical point in one coordinate system if and only 
if it is a critical point in the other. A critical point is thus a geometric property of 
a function; its presence does not depend upon the coordinates used to descibe the 
function. 

Now suppose the origin is a critical point. Then the equation forg^O) reduces to 

g"(0)=/'(0).(//(0)) 2 , 



Av = -(Ax)2-(Ax)3/3 
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implying that the second derivatives of / and g have the same sign at the origin, 
and confirming what is implicit in the second derivative test. If, on the contrary, the 
origin is not a critical point, then/'(0) ^ 0 and the equation forg^O) now includes 
the term f( 0) ■ /z"(0). When this additional term is taken into account, g"(0) may 
well differ in sign from /"(0). 

Here is an example to illustrate the “volatility” of the sign of the second derivative 
under a coordinate change near a regular point. Let 

y = f(x)=e x , x = h(u) = \\n{u+ 1); then y = g(u) = \Ju + 1. 

The two graphs make the point immediately: the exponential function has a graph 
that is everywhere concave up but the square root function has a graph that is every¬ 
where concave down. Let us go through the analytic details. First note that the origin 
is not a critical point, because /'(O) = 1 and g'(O) = (The values of the deriva¬ 
tives need not agree, but one cannot be zero unless the other is.) Second, we have 
h'( 0) = j and h"{ 0) = — Finally, for the second derivatives we have f'( 0) = 1 
and 

g"(0) =/'(0) • (*'(■ 0)) 2 +/(0) • h"{ 0) = 1 • i +1 • 4 = -i. 

One of the main objects of this book is to bring to the fore the geometric character 
of functions and maps. The geometric attributes of a map are the ones left unchanged 
when the coordinates change. The eigenvalues of a linear map are geometric in this 
way, and so are its rank and nullity. For a nonlinear function, we tend to concentrate 
on local behavior, for then we can hope to bring calculus to bear. Thus, critical 
points are genuine geometric features of a function: if the first derivative equals 
zero in one coordinate system, it will be zero in every other. The concavity of a 
function at a critical point is likewise geometric: if the critical point is a minimum 
in one coordinate system, it will be a minimum in every other. 

But the concavity of a function at a noncritical point is not geometric. This does 
not mean we cannot calculate concavity. We can; it is given by the sign of the sec¬ 
ond derivative. Concavity is nongeometric because the graphs that represent the 
same function in two different coordinate systems can have opposite concavities at 
the same (noncritical) point. An individual representative will have a particular con¬ 
cavity at a point, but other—equally valid—representatives will have the opposite 
concavity. 

A Taylor expansion gives us a good way to think about the role and the sig¬ 
nificance of the various derivatives of a function. Expanding y = /(x) in window 
coordinates near x = a gives 

Aj = / (a) Ax + \ f (a) (Ax) 2 + \ f" (a) Ax) 3 + • • • . 

The first nonzero term dominates. Thus, if f (a) ^ 0, then the local behavior of / 
near a is entirely determined by f'(a)\ the inverse function theorem implies there is 
a coordinate change Ax = h (Am) for which 


Geometry and 
local behavior 


Significance of the 
various derivatives 


A y = f (a) Au. 
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The linear term dominates; all the other terms vanish, implying all the higher deriva¬ 
tives, including the second derivative, have become zero in the new coordinate sys¬ 
tem. 

If, by contrast, the linear term is missing, /' (a) = 0, then dominance is transferred 
to the quadratic term. If f {a) / 0, that is exactly what happens. Morse’s lemma 
implies there is a coordinate change Ax = k(Av) for which 

4f = \f{a) ( Av ) 2 - 

In summary: /'(a) determines the local behavior of / when /'(a) / 0; /"(a) is 
geometrically irrelevant. But if f(d) — 0, then f'(a) determines local behavior, at 
least if f"[a ) / 0. If f"(a) = 0, then the cubic term should dominate, and so forth. 


7.2 Functions of two variables 


The local behavior of a function of one variable near a critical point is determined by 
the single quadratic term in the Taylor expansion, if that term is present. However, 
critical points of a function of two or more variables are more complicated: the 
local behavior of a function may not be determined by the quadratic terms, even 
when they are all present. Let us see how this can happen. 



Example 1: bowls, Consider first the pair of functions 

saddles, and gutters 

/+ (x,y) = x 2 +/ and /_ (pc,y) = x 2 

Each has a critical point at the origin, and each function serves as its own Taylor 
expansion there. In both cases, the quadratic part of the expansion is Q/{x,y) = x 2 ; 
the y -variable is absent. If local behavior at a critical point were always determined 
by the quadratic terms, we would have to conclude that /+ and / have the same 
local behavior at the origin. But they obviously do not: the graph of f + is a bowl, 
and /+ has a minimum at the origin. The graph of /_ is a saddle, and /_ has a 
“minimax” there. 
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The crucial distinction between f + and /_ lies in the way the y- variable appears 
in their formulas, but Qf has no _y-terms so it cannot “see” that distinction. The 
missing terms mean that, although z = Qf(x,y) does have a minimum at the origin, 
the minimum is nonisolated: all points along the y-axis are minima. (By contrast, 
the minimum of /+ is an isolated critical point.) As a result, the graph of Qf is 
neither a bowl nor a saddle; it has a new shape that we call a “gutter.” If the bottom 
of the gutter were to be bent up (e.g., by the addition of +y 4 ), it becomes a bowl; 
bent down (e.g., by adding — y 4 ), it becomes a saddle. 

It appears we can attribute the degeneracy of the critical point of f + or / to this 
defect in Qf. In fact, this is true, but it is not the whole story: we now show that, 
even if all three quadratic terms are nonzero, those terms may still not determine 
local behavior. Transform f + and /- by rotating coordinates 45° (dilating by \fl, to 
keep the formulas simple). That is, let 


L : 


X = u — V, 
y = u + v, 


and let 


g + (u, v) = f+{L(u,v)) = a 2 — 2wv + v 2 + w 4 + 4m 3 v + 6w 2 v 2 + 4«v 3 + v 4 , 
g-(w, v) = /-(Z(m, v)) =u 2 — 2uv + v 2 — u 4 — 4m 3 v — 6 w 2 v 2 — 4mv 3 — v 4 . 

Each of the new functions still has a critical point at the origin, and each formula 
still serves as its own Taylor expansion there. There is no qualitative change, either: 
g + , like f + , has a minimum at the origin, and g-, like /-, has a saddle. Because the 
quadratic parts of the new functions are identical, 

Qg(u, v) = u 2 — 2uv+ v 2 , 

the new Q g does no better at determining local behavior than the original Qj did, 
even though all three quadratic terms are present in Q g . 

The formula for Q g is different from the formula for Qf, however, its graph is 
not, because the rotation-dilation that transforms f± into g± also transforms Qf 
into Q g . The graph of Q g is just the graph of Qj rotated 45°, a gutter whose bottom 
lies along the line v = u. Thus, without referring directly to the connection between 
g + and /+, we can still attribute the degeneracy of the critical point of g + at the 
origin to the fact that the graph of Q g is a gutter. In geometric terms, Q g has the 
same defect as Qf. 

In analytic terms, the defect arises because there is a coordinate change that trans¬ 
forms Q g into a single square, z = ±x 2 , so that the other variable is completely 
missing. To determine when a critical point is degenerate, we must therefore decide 
when a general function of the form 


Example 2: 
rotate example 1 


Q g has the same 
defect as Q f 


Q{x,y) = Ax 2 + 2 Bxy + Cy 2 
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Quadratic forms 


Quadratic forms 
and matrices 


Transforming a 
quadratic form 


can be transformed into a single square. To do this it helps to use vector and matrix 
notation. 


Definition 7.1 A quadratic form in two variables is a function of the form 

Q(x,y) = Ax 2 + 2Bxy + Cy 2 = (x v) = x f Mx = Q(x). 

The symmetric matrix M is called the matrix of the quadratic form. There is a 
1-1 correspondence: Q <-> M. That is, every symmetric matrix determines a unique 
quadratic form, and every quadratic form determines a unique symmetric matrix. 
The symmetry is essential for uniqueness, because, for example, 



This points up the fact that if we start with any 2x2 matrix A, the formula Q(x) = 
x^A x defines a unique quadratic form. However, if we start instead with the form 
Q, there is only one symmetric matrix M for which x ' Mx = Q(x). (Thus we write 
the xy coefficient of Q as 2 B to simplify splitting it into two equal parts on the 
“off-diagonal” of M, to make M symmetric.) 

Suppose L is an invertible 2x2 matrix so x = Zu is a linear coordinate change. 
Then, in terms of the new coordinates u = (w,v), the quadratic form Q(x) = x^Mx 
is transformed into 

£)(u) = Q(L u) = (Zu ) 1 M (Zu) = u 1 " (Z 1 ML) u. 

Thus Q is also a quadratic form. Furthermore, ML is symmetric (here Z^ is the 
transpose of Z) because (ZI ML ) f = Z 1 tyftL 11 = L^ML; therefore M = ZI ML is the 
matrix of Q. For example, if 





and Z = 



then Q <-> 


10 14\ 
14 1 ) ’ 


that is, Z transforms Q = 5x 2 +6xy—y 2 to Q = 10m 2 -F28hv+7v^; see the exercises. 
Note that, because Z is invertible by definition, Z ^ ML is invertible if and only if M is. 
The following theorem identifies the quadratic forms that have the defect we have 
come to associate with degenerate critical points. Although the theorem is a special 
case of Theorem 7.10 (see below, p. 244), we give it its own proof. 

Theorem 7.3. Let Q(x) = x' M x he a quadratic form. Suppose a linear coordinate 
change x = Zu can be chosen so that the variable u does not appear in the formula 

Q{u,v) = £>(u) = u^L^MLu 


for the transformed quadratic form. Then the matrix M of Q is noninvertible and 
conversely. 
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Proof. Let us first suppose M is noninvertible. Then there is a nonzero vector r in 
its kernel: Mr = 0. Choose a second vector s so that {r, s} form a basis for K 2 , and 
let L be the invertible matrix whose columns are the vectors r and s. If we write Q 
as transformed by L in the form 

<2(w,v) = u f L f MLu = (u v) , 

then the variable u will be missing from this expression if a = /3 = 0, that is, if the 
entries in the first row and the first column of L I ML equal 0. 

To show that ML has this property, first write L and L t in the form 

z = ( r s), L f = (y)- 

(Note that and are row vectors.) Then matrix multiplication allows us to write 
ML in a similar way, as 


ML = {Mr Ms) = (0 Ms). 


It follows that 


Oml 



( r + 0 r l'Ms\ 
\s^0 s^Ms) 


f 0 CMs\ 

\0 s^MsJ 


The entries in the first column of L'ML are therefore zero, and because the matrix 
is symmetric, the entries in its first row must be zero as well. 

To prove the converse, we suppose that one of the variables in u = (u, v) is miss¬ 
ing from the expression 

0(u) =u t Z t MZu. 

Then M = ML has a row (and a column) of zeros, so detM = 0, implying M is 
noninvertible. Consequently ,M = {L^)~ l ML^ x is noninvertible, as well. □ 

As a result of Theorem 7.3, we find that the natural way to distinguish between 
quadratic forms is provided by the following definition. 

Definition 7.2 A quadratic form Q{\) = x + /V/x is nondegenerate if its matrix M is 
invertible, and is degenerate otherwise. 

Corollary 7.4 The quadratic form Q{x,y) = Ax 2 + 2 Bxy + Cy 2 is nondegenerate if 
and only if AC f B 2 . 

Proof. The determinant of the matrix of Q is AC — B 2 ; Q is nondegenerate if and 
only if this determinant is nonzero. □ 

To connect these general results about quadratic forms back to the local behavior 
of a function at a critical point, we introduce the Hessian. 


The Hessian 
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Local behavior 
and the Hessian 


Analogies 


Definition 7.3 Suppose the function z = f(x.y) has continuous second derivatives 
on a neighborhood of a critical point (x,y) = (a. b). The Hessian of f at (a.b) is 
the symmetric matrix of second derivatives 

_ (fxx{a,b) fxy(a,b)\ 

{a ' b) \fyx{a,b) fyy{a,b)J 

The Hessian form of f at (a.b) is the quadratic form associated with the Hessian. 

Continuity of the second derivatives guarantees that H (ah \ is symmetric. Because 
there is usually no chance for confusion, we use the symbol H/ a ,b) for the Hessian 
form as well; thus 


H (a,b) ( x ,y) = fxx (a, b) x 2 + 2 f xy (a, b)xy + f yy (a 1 b)y 2 . 


Now assume that / has continuous third derivatives near (a, b) so we can write 
the second-order Taylor expansion of / at (a,b). In terms of window coordinates 
Ax = x — a, Ay =y— b and Az = f(a + Ax,b + Ay) —f(a,b) and the Hessian form, 
the expansion is simply 


Az=\H {afi) (Ax,Ay) + 0(3). 

This tells us the local behavior of / near (a. b), so we ask: when, and how, does the 
Hessian determine that local behavior? In other words, when does the quadratic form 
H( a ) y } (Ax, Ay) dominate the higher-order terms represented by 0(3)? The answer is 
provided by Morse’s lemma. 

Definition 7.4 Suppose the function z = f(x,y) has continuous second derivatives 
near the critical point (a.b). Then (a.b) is nondegenerate if the Hessian II( a ,b) °ff 
at (a.b) is nondegenerate, and is degenerate otherwise. 

Theorem 7.5 (Morse’s lemma). Suppose z = f(x.y) has continuous third deriva¬ 
tives in a neighborhood of a nondegenerate critical point (a.b). Then, in a suffi¬ 
ciently small window centered at (a.b), there is a coordinate change (Au.Av) = 
h(Ax,Ay) (nonlinear, in general) for which 

Az = ± (Am) 2 ± (Av) 2 . 

The signs of (Am) 2 and (Av) 2 are the signs of the eigenvalues of the Hessian H^ a A ^ 
off at (a.b). 

Proof. See the proof of the M-variable version. Theorem 7.16 (p. 248), in the next 
section. □ 

The two eigenvalues of the Hessian are analogous to the single second deriva¬ 
tive in the one-variable version (Theorem 7.1, p. 221). The Hessian is symmetric; 
therefore its eigenvalues are real (Exercise 2.14.a, p. 60). The critical point is non¬ 
degenerate; therefore the eigenvalues are nonzero; the sign of each is either positive 
or negative. 
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Corollary 7.6 (Second derivative test) Suppose z = f(x.y) has continuous third 
derivatives in a neighborhood of a critical point (x,y) = (a,b). Then the nature of 
the critical point depends on the values of the second partial derivatives of f, (all 
evaluated at (a,b)), as follows. 

• A saddle point if f xx fvy — fly < 0 

• A local minimum if f xx f yy — f xy > 0 and fa+ fyy> 0 

• A local maximum if f xx f )y — fly >0 and f xx + f yy < 0 
If f xx fyy — f%y = 0, the test is inconclusive. 

Proof. According to Morse’s lemma, the nature of the critical point is determined 
by the signs of the eigenvalues, as follows. If the eigenvalues have opposite signs, 
then Az = ±((Am) 2 — (Av) 2 ), a saddle; if both are positive, then Az= (At/) 2 + (Av) 2 , 
a local minimum; if both are negative, then Az = — (Am) 2 — (Av) 2 , a local maximum; 
if either is zero, Morse’s lemma does not apply. 

If Ai and A 2 are the eigenvalues of Ht ab \, then 

Ai A 2 = det H a = f xx fyy - fl y , Ai + A 2 = trf/ a = f xx +f yy . 

All the assertions of the test now follow, including the final one about an inconclu¬ 
sive result. □ 

We now work through the details of a rather rich and varied example to see how 
Morse’s lemma applies. The example begins with the function 

z = f(x,y) = (x 2 +y 2 - i) 2 - 

First of all, because x and y appear only in the form x 2 -Fy 2 , the graph must be 
rotationally symmetric around the z-axis. Furthermore, z > 0 (because z equals a 
positive square), and z attains its minimum value, z = 0, everywhere on the circle 
x 2 +y 2 = 1. (These minima are thus nonisolated critical points; cf. page 225.) If 
(x,y) is near the origin, but (x,y) f (0,0), then z < 1. But z = 1 when (x,y) = (0,0), 
so z has a local maximum at the origin. The graph of / therefore resembles the base 
of a wine bottle. (The sediment that precipitates out of an old wine will settle into 
the small space along the ring of minima. ) 


y 




Example: 
the wine bottle 
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Analyzing the 
critical points of / 


The Hessian form 
is degenerate 


The missing variable 
in the Hessian 


The level curves reflect the circular symmetry; they are all concentric with the 
origin. Each level 0 < z < 1 consists of a pair of circles on either side of the level 
z = 0 (the unit circle). Each level above z = 1 is a single circle that lies outside the 
unit circle. 

Let us carry out a standard analysis of the critical points of /. We have 

^ = 4x{x 2 +y 2 - 1), =4y(x 2 +y 2 - 1), 

ox dy 

so (x,y) = (0,0) is a critical point in addition to each of the points where x 2 4-y 2 = 1. 
To apply the second derivative test, we need the Hessian, which equals 

_ /4(a 2 + b 2 — 1) + 8a 2 %ab \ 

H(a ' b) ~ V 8wb 4 (a 2 + b 2 - 1) + 8b 2 ) 

at an arbitrary point (a, b). At the origin, 

H m = ( 0 4 _° 4 ) > 

so the test succeeds and tells us that the origin is a (nondegenerate) local maximum. 
At any point on a 2 + b 2 = 1, however, the Hessian reduces to 

V(a.b) = but det H m = 64a 2 b 2 - 64 a 2 b 2 = 0, 

so the test fails. All points on the ring a 2 + b 2 = 1 of minima are degenerate critical 
points of /. Consider now what this means for the Hessian form: 

H^ a ^(Ax,Ay) = 8a 2 (Ax) 2 + 16a6ArAy + 86 2 (Ay) 2 = 8(aAx + bAy) 2 . 

The Hessian form involves only the square of a single quantity, aAx + bAy. Thus, if 
we introduce the new variables 

Au = aAx + bAy, Av = —bAx + a Ay, 

then the Hessian form is just 8 (Aw) 2 . The variable Av is missing here, so the Hessian 
is indeed degenerate in precisely the sense we have been using for quadratic forms. 
The formulas for Aw and Av give us new coordinates in the window centered at 
(x,y) = (a, 6); the coordinate change is the linear map defined by the matrix 



Because a 2 + b 2 = 1, it follows that P is a pure rotation. Let 9 be the angle from 
the positive x axis to the radial line from the origin to the point (a,b) (so 6 = 
arctan(6/w)). Then P is rotation by the angle arctan(— b/a) = — arctan(6/a) = — 9. 
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This means that the positive Ax-axis lies at the angle ~9 from the positive Aw-axis; 
see Exercise 5.16, page 180. Consequently, the At<-axis points in the same direction 
as the vector (a,h)—the radial direction—so the Av-axis is tangent to the ring of 
minima. Compare this to our previous example: when the Hessian form had no y- 
component, it had a line of critical points in the direction of the y-axis. 



The purpose of our extended example is to see how Morse’s lemma illuminates Modify the function 

the structure of a function near a nondegenerate critical point. But the critical points by tipping its graph 

of the ring of minima are degenerate, so Morse’s lemma does not apply to them. 

(Morse’s lemma does apply to the isolated maximum at the origin, but the character 
of that critical point is already evident.) We have more success by first altering the 
function so its ring of minima “breaks up” into just two isolated critical points. We 
can do this by tipping the graph slightly, as in the figure in the margin, below. The 
base, which had been sitting on the entire ring of minima, now shifts to rest on a 
single point. This point is the absolute minimum of the new function. As we show 
presently, the opposite point on the ring will shift into a saddle point. There are no 
other critical points (besides the local maximum that persists near the origin). All 
this happens no matter how slightly the graph is tipped. 

Alhough it is easier to think of the tipping as a rotation—for example, a rotation 
of the (x,z)-plane about the y-axis—the formula for the altered function will be 
simpler if the tipping is done by a shear —again, of the (x,z)-plane; see the example 
in the margin. A vertical shear with slope m is given by 
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Critical points of f m 


The shear in the figure uses m = 0.4. We see it has the right sort of action on the 
grid of squares and, at the same time, we see what it does to (a vertical slice of) the 
graph of /. The formula for the sheared function f m is 

z = fm(x,y) =mx + f(x,y) = (x 2 +j 2 — 1 ) 2 + mx. 


Notice that even the grid on the surface of the graph in the margin has been sheared. 
Also, the shearing has carried part of the graph below the negative x-axis, as we 
would expect. 

The vertical slice shows that the critical points of z = f m (x, 0) (marked by 
the open dots in the (x,z) -plane on the right) are shifted in relation to those of 
z = /(x, 0): when m > 0, the minima move left and the maximum moves right. 
In Exercise 7.3, you show that the critical points are approximately 


maximum: x 


m . . m 

—, minima : x ss-± 1, 

4 ’ 8 


when in is small. 


Let us now analyze the critical points of z = f m (x,y), where m is small but 
nonzero. We must have 

= 4x(x 2 + v 2 — 1) + m = 0, = 4v(x 2 + v 2 — 1) = 0. 

ox ' dy 

For d fm/dy = 0 to hold, either y = 0 or x 2 +y 2 — 1 = 0. If we assume the second 
of these equations, then df m /dx = 0 reduces to m = 0; but this contradicts our 
assumption that m / 0. Hence, no point on the ring of minima of the original / is 
a critical point of the new function f m . So let us assume instead that y = 0. Then 
df m /dx = 0 reduces to 

4x(x 2 — 1) + m = 4x 3 —4x + m = 0. 


When hi is small, this cubic has three real roots, p\, P 2 , PP, see Exercise 7.3. 



The figure above shows an alternate geometric approach to locating the critical 
points. They appear as the points of intersection of the critical curves on which 
dfm/dx = 0 (shown dotted in the figure; m = 0.3) and df m /dy = 0 (the circle-plus- 
line shown in gray). The curves intersect in the three points p \, pi, P 3 on the x-axis. 
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To determine the type of each critical point, we calculate the Hessian, restricting 
ourselves to points of the form (p, 0 ): 


H (p, o) ^ 


12/-4 0 

0 4p 2 — 4 


When m is sufficiently small and positive, the three critical points satisfy 

P\<—\, 0 < P2 < 1 /Vl , l/v/3 </>3 < I. 

This allows us to make the following inferences about their Hessians: 




+ 0 
0 + 


H(j>2,0) 


- 0 
0 - 


H(pz,0) 


+ 0 
0 - 


It follows that pi is a (local) minimum, p 2 a (local) maximum, and pj, a saddle. 
(What happens if m < 0?) 




Think of the figure on the left, above, as showing the graph of f m filled with 
liquid up to the level of the saddle point p 3 . The level curve of f„ at that level 
is a thin crescent that has the characteristic “X” shape (albeit elongated and bent) 
where it passes through the saddle point itself. In the contour plot on the right, the 
liquid surface is shown in gray. Outside the crescent, the spacing between successive 
level curves is still Ac = 0.25, as it was for the original function in the contour 
plot on page 229. However, at that spacing, no further level curves will be found 
inside the crescent; the minimum point p\ lies only about 0.2 units below the saddle. 
The single curve that is shown inside (in the shaded crescent) is about 0.18 units 
below the level of the saddle. As m —> 0, the shaded crecent shape gets thinner, 
converging to the ring of minima when m = 0 , and this contour plot becomes the 
one on page 229. 

Now let us see what Morse’s lemma tells us about the saddle point (pi,0). Fun¬ 
damentally, it provides new curvilinear coordinates (Ai/,Av) that will reduce the 
window equation to (Am) 2 — (Av) 2 . To understand this, we begin by constructing 
the window equation at any point (/?, 0 ) on the x-axis: 


The type of each 
critical point 


The crescent-shaped 
level at the saddle 
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7 Critical Points 


Completing the square 


Morse’s observations 


Az = f m (p + Ax, Ay) - f m (p , 0) 

= (4/> J - 4y> + m) Ax + (6y> 2 - 2) (Ax) 2 + (2/? 2 — 2) (Ay) 2 
+ 4 p (Ax) 3 + 4/>Ax (Ay) 2 + (Ax) 4 + 2(Ax) 2 (Ay) 2 + (Ay) 4 . 


At a critical point, 4/> 3 —Ap + m = 0, so Az loses its linear term (as we expect). 
If the window equation were purely quadratic, of the form 

A z = A (Ax) 2 + 2BAxAy + C (Ay) 2 , 

with A, B, C constants, then we could make Az a sum of two squares by the familiar 
process of completing the square (assuming A ^ 0): 

/ d d2 \ n 2 

A z=A( (Ax) 2 + 2 — Ax Ay + (Ay) 2 J - — (Ay) 2 +C(Ay) 2 

=A(Ax + jA^j -(^-c)(A y) 2 . 

To finish, let us suppose A > 0; this makes the first square positive and the second 
negative (we expect the squares to have different signs at a saddle). The coordinate 
change 

! A u = \J~AtAx 4—Ay, 

Av = Ay \/ ^-C, 

then gives Az = (Am) 2 — (Av) 2 , a simple sum of (positive and negative) squares. 

However, because the given Az is not purely quadratic, this approach seems futile. 
But now Morse makes two crucial observations: 

• The validity of the change of coordinates h does not depend on the coefficients A, 
B, and C being constants; a quadratic form with variable coefficients can work, 
too. 

• The window equation at any critical point can be “disassembled” properly into a 
quadratic form with variable coefficients. 

He then provides (remarkably simple) instructions for disassembling the window 
equation into the proper components. We define equivalent instructions below 
(Lemma 7.3 p. 249), and they give us the following (see p. 251). 

A = 6 p 2 — 2 + ApAx+ (Ax) 2 + | (Ay) 2 , 

B = |/>Ay+ |AxAy, 

C = 2 p 2 — 2 + |/>Ax+ j (Ax) 2 + (Ay) 2 . 
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We see above the action of the map h : (Ax, Ay) i—> (At/, Av), using the expressions 
for A, B, and C just given, with p = p 3 = 0.9872574766623532. On the left is the 
window with its “native” (Ax, Ay)-coordinates. In the middle is the same (Ax, Ay)- 
window but now overlaid with the curvilinear coordinates (A u, Av) pulled back by h. 
On the right is the (curved) image of the window as pushed forward by h to the 
(An, Av)-plane. The windows are very small; the spacing in both coordinate grids is 
0.005. It is clear that h “squares up” the contours: the zero-level Az = 0 becomes the 
pair of perpendicular straight lines Av = ±A u. Notice that the zero-level intersects 
the (A«,Av)-grid lines in exactly the same places in both windows. The other two 
contours are not equally spaced with Az = 0 but are instead chosen at levels (namely, 
Az = —0.0002 and Az = 0.0006) that show up well in the original (thin) window. 

It remains to verify that h is indeed a valid coordinate change—that is, an invert¬ 
ible map—on some neighborhood of (Ax, Ay) = (0,0). The functions that appear in 
h are smooth where they are defined; thus the inverse function theorem says it is suf¬ 
ficient to show that the derivative dh( 0 0 ) is invertible. This follows (cf. Exercise 7.4) 
from _ 


dh(o,o) — 


V 6p 2 -2 0 N 

v 0 


“Squaring up” contours 
near the saddle point 


h is invertible 


1.96165 0 

0 0.225045 
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7 Critical Points 


The domain of 
invertibility 



The axes are eigendirections of the derivative dh ( 0 0 ); consequently, the image of 
each axis under h itself is tangent to the corresponding axis in the target. Moreover, 
h approximately doubles horizontal distances but compresses vertical distances to 
less than a quarter of their original length. The figure above shows all this quite 
clearly. 

Morse’s lemma guarantees that there are curvilinear coordinates on some open 
set around the critical point on which the function appears as a sum of squares. But 
how large is that open set? It is the set on which the coordinate change map h is 
invertible. In this case, we can expect the invertibility to break down when the form 

Az = A^x+jAy S j — cj (Ay ) 2 

becomes degenerate. This will happen if either coefficient vanishes. Here the crucial 
coefficient is the second one. The figure in the margin shows that the curve B 2 = 
AC contains points very close to 773 : (Ax, Ay) = (0,0). Thus, only by keeping the 
window at pj, relatively narrow was it possible to avoid that curve and thus avoid 
losing the invertibility of h. 


Curvilinear coordinates We can obtain curvilinear coordinates that “square up” the contours of f m around 
near the minimum its minimum point (x,y) = (p i, 0 ) using essentially the same coordinate transfor¬ 

mation h. Apart from the obvious change from p = p^ to p = p\, just one pair 
of modifications is needed. First, because the critical point is now a minimum, the 
window equation must be rewritten as a sum of positive squares, 


A z=A 



Note the change in the form of the coefficient of (A y) 2 ; this forces a corresponding 
alteration in the formula for Av: 


Av = Ay y C — . 

The result is shown below. The “native” window, on the left, has the same pro¬ 
portions as the one we used for the saddle point, but it is half again as large. The 
source and the target of h are drawn to the same scale (and a grid square in the 
(Am, Av)-plane is 0.01 units on a side), making it evident that h stretches the hori¬ 
zontal direction but compresses the vertical. What the grids actually show us are the 
effects of the pullback by h -1 : horizontal compression and vertical elongation. The 
contours of / are equally spaced, at the levels Az = 0.0005,0.0010,0.0015,0.0020. 
Notice that each contour meets points of the (Am, Av)-grid in exactly the same places 
in both windows. 
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Details for the derivative dh( 0i o) are very similar to those for the saddle point; note 
that the slight change in the definition of Av has caused \J2 — 2 p 2 to be replaced by 

VV-2. 

"Up 0367 0 V 

V 0 s/W^) V 0 °- 2222 / 

Because dh( 0 ,o) is once again a diagonal matrix, the image of each axis under h is 
tangent to its corresponding axis in the target. Horizontal lengths are approximately 
doubled and vertical ones are compressed by the factor 2/9. We conclude that h 
is locally invertible, giving valid curvilinear coordinates (Am,Av) in some suitably 
restricted window centered at the minimum point (x,y) = (p \, 0). 

We now consider briefly a second function that is simpler than the wine bot¬ 
tle but nevertheless illustrates new aspects of Morse’s lemma. The function is one 
introduced by Descartes: 


Invertiblility of h 


Example: the folium 
of Descartes 


z = f(x,y) = x 3 + y 3 — 3 xy. 
































































238 


7 Critical Points 


Some of its level curves near the origin are shown in the figure below. In the shaded 
region, where the function takes positive values, the contour interval is Az = 1.5; in 
the unshaded region, we have used a smaller interval: Az = 0.2. The zero-level curve 
that separates the two regions includes a leaf-shaped loop that has led to the curve 
being called the folium (“leaf’) of Descartes. We use the same name to refer to the 
function itself. 


y 



The level curves make it clear that z = f(x.y) has a saddle at the origin and a 
local minimum inside the “leaf,” and a quick calculation shows that the minimum is 
at (x, y) = (1,1). In terms of window coordinates Ax = x — 1, Ay = y— 1 centered at 
the minimum, the formula for / becomes 


z — (1 + Ax) 3 + (1 + Ay) 3 — 3(1+ Ax) (1 + Ay) 

= 1 + 3 Ax + 3 (Ax) 2 + (Ax) 3 + 1 + 3Ay + 3 (Ay) 2 + (Ay) 3 


— 3 — 3Ax — 3 Ay — 3AxAy 
= -1 + 3 (Ax) 2 - 3 Ax Ay + 3 (Ay) 2 + (Ax) 3 + (Ay) 3 . 


This reduces to 


Az = A (Ax) 2 + 25 Ax Ay + C (Ay) 2 , 


with Az = z + 1 and 


A = 3+ Ax, B = -3/2, C = 3 + Ay, 


Action of the 
coordinate change h 


The standard coordinate change 



in the window then transforms Az into 


Az = (Au) 2 + (Av) 2 . 














7.2 Functions of two variables 


239 


The contours in the original (Ax, Ay)-window are roughly elliptical. The map h car¬ 
ries them to concentric circles in the target (At/, Av)-plane. The figure below helps 
us to follow the details. The curvilinear (A//, Av) coordinates that are overlaid on the 
source on the left are the ones pulled back from the target by h. Therefore, the inter¬ 
sections between the original contours and the curvilinear grid in the source match 
exactly the intersections between the image circles and the square grid in the target. 


Av 




At this scale (the source window is a unit square), the contours are close to ellipses, 
and h looks almost linear. Its linear approximation at the origin is 


dh(o,o) — 


/V3 ~V2/2\ 

VO 3/2 J 


Action of h 


The map resembles a horizontal shear that pushes points that lie above the horizontal 
axis to the left and points below to the right. Horizontal distances are increased by 
a factor of about \/3 wl.7, and vertical ones by a factor of about 1.5. The effect of 
the dilation is to make the ellipses both larger and somewhat wider; the effect of the 
shear is then to turn them into circles. 

Notice that the A//- and Av-axes we see overlaid on the source do not line up with Comparing the folium 
the major and minor axes of the nested ellipses. This points to the main difference ar, d the wine bottle 
between the folium and the wine bottle examples. At the minimum of the tipped 
wine bottle, the curvilinear coordinate axes were aligned with the principal axes of 
the (approximate) ellipses, so the coordinate change h had a simpler action there: 
to turn the ellipses into circles, it just stretched the ellipses by two different factors 
along their principal axes. As a consequence, the derivative dh( 0 ,o) was a diagonal 
matrix, representing a pure strain in the coordinate directions. 
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Quadratic forms 
under rotations 


The ellipses 


The hyperbolas 


At the minimum point of the folium, however, the derivative is not a diagonal 
matrix. Appearances to the contrary notwithstanding, it is a pure strain, though, 
(rather than the shear it appears to be) because its eigenvalues, \/3 and 3 /2, are real 
and unequal (cf. Theorem 2.6, p. 40). Hence we can convert dh( 0 ,o) into a diagonal 
matrix by using a further coordinate change that will align the new coordinate axes 
with the strain directions, that is, with the principal axes of the ellipses. In fact, if we 
restrict ourselves to an ordinary quadratic form with constant coefficients, we can 
show that the additional coordinate change can be taken as a rotation (that aligns the 
coordinate axes with the symmetry axes of the level curves). 

Here are two examples of typical quadratic forms with their level curves. For 
each, we provide a rotation that transforms the form into a sum of squares, allowing 
us to infer analytically the shape of its level curves. 


Qeii = 6x 2 -4xy + 3y 2 , 

y 



Under the respective coordinate changes 


(?h yp =x 2 + 6xy+y 2 . 

y 



u — 2v | 

( u + v 

* V5 ’ 

\ X - VT 

2« + v "‘"M 

\y-~ U + V 

y y/5 ’ 

r y/2 


the two quadratic forms pull back to 

8ln = 2“ 2 + 7v 2 , Q* hyp = - 2m 2 + 4v 2 . 

The map h e ii is rotation by 9 = arctan2, and hhyp is rotation by 9 = —45°. The 
rotations cause the (u, v)-coordinates to line up with what appear to be the symmetry 
axes of the level curves. We say that the quadratic forms have been transformed to 
principal axes. 

The equation Q* t] = r (r > 0) describes an ellipse whose principal axes are the 
coordinate axes. All the different ellipses (i.e., for different r > 0) have the same 
proportions; that is, they are similar figures in the sense of Euclidean geometry. 
Because rotation preserves lengths and angles, we conclude that the level curves of 
Q e w are nested similar ellipses that share their principal axes. 

Likewise, Q^ yp = r describes a hyperbola whose principal axes are the coor¬ 
dinate axes. All the different hyperbolas have the same asymptotes; these are the 
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straight lines (“degenerate hyperbolas”) defined by = £>hyp = 0. In the (u,v)- 
coordinates, the asymptotes have the equations v = ±u/\J 2; in the original (x,y)~ 
coordinates, we can get the equations either by substitution using hi iyp or by com¬ 
pleting the square: 


—8x 2 + (3x +y ) 2 = 0 or y = (—3 ± V / 8)x. 


The coefficients 
are eigenvalues 


"*»=(- 2 3 2 )’ ^=( 31 )’ 

PeW (A) = A 2 — 9A + 14 T*h yp(A) = A 2 — 2A — 8 

= (A — 2) (A — 7), = (A +2) (A — 4). 

Thus, the ratio of the eigenvalues determines the geometry of the level curves: the 
sign of the ratio indicates the kind of curves (+ for ellipses, — for hyperbolas) and 
its magnitude indicates their aspect ratio. 

As we have seen (cf. p. 226), when a linear map x = Au is used to change coor¬ 
dinates in the quadratic form Q(x) = x^Mx defined by a symmetric 2x2 matrix M, 
the matrix of the transformed quadratic form is M* = if ML: 

g*(u) = Q(Lu) = (Au) t M(Au) = vl'L^ML u = u f M*u. 

But the rotation matrices we are now using for coordinate changes have a special 
property: the transpose of a rotation is its inverse: 

Re l =R-e=Rl 

Therefore, when a rotation R is used to transform a quadratic form, we can write the 
relation between the two matrices defining the forms in a new way: M* = R l MR. In 
particular, if the transformed Q* is a sum of squares, then its matrix M* is a diagonal 
matrix, and we have the following result. 

Theorem 7.7. If the rotation x = Rn transforms the quadratic form Q(x) = x^ Mx 
into Q*(v) = u^Z)u, where D is a diagonal matrix, then the diagonal elements of D 
are the eigenvalues of M and the columns of R are corresponding eigenvectors. 


Because rotation preserves lengths and angles, we conclude that the level curves of 
Qhyp are hyperbolas that share asymptotes and principal axes. 

Not only do the signs of the coefficients in the formulas for Q * eu and 0£ yp have 
geometric meaning, their ratio does, too: it determines the “aspect ratio” of the level 
curves. For example, in the first figure, seven ellipses cross the v-axis in the same 
distance that just two cross the w-axis. In the second figure, six hyperbolas cross the 
v-axis in the distance that three cross the z<-axis. Call the ratio of the numbers in 
each pair the aspect ratio of the curves. This ratio is the same as (the absolute value 
of) the ratio of the eigenvalues of the symmetric matrices that define the forms: 
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Transforming to 
principal axes 


Proof. Let the diagonal elements of D be a\ and 0 . 7 , and let ei = (1.0) ' and £7 = 
(0,1) 1 be the standard basis vectors in R 2 . Then 

£>ei=aiei, Z)e 2 = a 2 e 2 , 

so a, is an eigenvalue of D with eigenvector e,, i = 1.2. By assumption, D = R'MR = 
R~ l MR, or RD = MR. Let v; = /7e,; this is the i-th column of R. We find 


a,\i = R{o.,Hj) = RDe, = = Mv ; , i = 1,2, 


implying that a, is an eigenvalue of M with eigenvector v,-. □ 

Using Theorem 7.7 as a guide, we now have a way to transform a quadratic form 
to principal axes, that is, a way to construct a rotation x = Ru that will align the 
u-coordinate axes with the principal axes of the curves Q(x) = constant and reduce 
the form to a sum of squares. 

Theorem 7.8 (Principal axes theorem). For any quadratic form Q(x) = x’Mx, 
there is a rotation x = Ru that transforms Q into a sum of squares Q* (u) = Ai u 2 + 
A 2 v 2 , where Ai and A 2 are the eigenvalues ofM. 

Proof. We use a proof that extends naturally to quadratic forms in n variables. We 
know M has a real eigenvalue Ai with an eigenvector v that we can assume to be a 
unit vector. Let w be a unit vector orthogonal to v, chosen so the square v A w has 
positive orientation (cf. p. 41). Let R be the matrix whose columns are v and w, in 
that order: 


R = (v w), RR = 


V' V V 1 w 


, W' / \ W + V ffl'wi 


R f R = 


'1 (U 

0 u 


so R ’ 1 =R 1 . Then MR = (Mv Mw) = (Ai v Afw), and 

^Aiv^v / Ai 0 N 


R- 1 MR = R f MR = 


.Aiw'l'v w^A/w 


MU) 


= D. 


where /3 = w'/V/w. The lower-left term of D is zero because v and w are orthog¬ 
onal; the upper-right term is zero because D = R^MR is symmetric. The proof of 
Theorem 7.7 shows that /3 = A 2 , the second eigenvalue of M, and that w is a corre¬ 
sponding eigenvector. 

Finally, because v lies on the unit circle and w lies 90° counterclockwise from it, 


v = 


/ COS0\ 

ysinO ) ’ 


w = 


A-sin (A 

^ COS0 J ’ 


for some 0 < 9 < 2 n. Thus R=Rq. □ 

Corollary 7.9 Level curves of the quadratic form Q(x) = x^Mx are ellipses if 
det M > 0 and are hyperbolas if detAf < 0. 
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Proof. Transforming Q to principal axes gives Q* (u) = Ai if + Aov 2 , where detM = 
A 1 A 2 . The level curves are ellipses when this product is positive and hyperbolas 
when it is negative. □ 


7.3 Morse’s lemma 

In this section we show that the local behavior of a function z = f(x 1 ,... ,x n ) at a 
nondegenerate critical point is determined by its Hessian matrix of second deriva¬ 
tives at the critical point. The key step is Morse’s lemma, which provides a coordi¬ 
nate change that reduces the function to a pure sum of squares near the critical point. 
We begin by transferring to n dimensions all the terms and concepts introduced in 
the previous section. 

Definition 7.5 A quadratic form in n variables is a function of the form 

Q(x) = x f Mx. 

where x = ,... ,x„) (treated as a column vector) andM is an n x n matrix. 

Note that the matrix M need not be symmetric, nor is it uniquely defined by Q\ 
see the exercises. In fact, adding a antisymmetric matrix RtoM does not alter Q, 
because R by itself defines the quadratic form that is identically zero. The next 
lemma is the converse; it says that only the antisymmetric matrices define the zero 
form. (An antisymmetric matrix is also said to be skew-symmetric.) 

Lemma 7.1. Suppose the quadratic form Qo(x) = x^x is identically zero; then the 
matrix R = (r,y) is antisymmetric; that is, rj\ — — for every i,j= 

Proof. We evaluate Qo (x) for particular vectors x. First take x = e„ the ith standard 
basis vector in W. Then 0 = = r,,. Next, take x = e,- + tj, i f j; then 0 = 

Qo(.ei + ej) = rij + rji. □ 

According to the next lemma, with each quadratic form Q we can associate a 
unique symmetric matrix M that defines the form: Q{x) = xfMx. We write £) <-> M 
to indicate this association. 

Lemma 7.2. Suppose Q(x) = x' Mx is a quadratic form, where M is an arbitrary 
n x n matrix. Then M = (M + M^)/2 is symmetric and defines the same quadratic 
form. Moreover, if S is symmetric and Q(x) = x^Sx, then S = M. 

Proof. Let Q^(x) = x'M' x be the quadratic form defined by the transpose ma¬ 
trix Av. Because Q' { x) is just a scalar (a 1 x 1 matrix), it is equal to its own trans¬ 
pose; thus 

(x) = (x t M t x) t = x f Mx = Q(x). 

In other words, even when M and Af are different, they define the same quadratic 
form. Now let Q be the quadratic form defined by M. Then 


Quadratic forms 


Antisymmetric matrices 


Symmetric matrices 
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Q(x) = x t ±(M+M t )x = \(x* Mx + x^ M^ x) = i(0(x) + g t (x)) = g(x). 

If g(x) = x + ,S'x; then the quadratic form go defined by the symmetric matrix 
R = S — M must be identically zero: go( x ) = tt^Rx = 0. By the previous lemma, 
R is also antisymmetric, so R must be the zero matrix, implying that S = M. □ 

We now single out the degenerate quadratic forms as the ones that are either 
missing a variable or can be so transformed by a suitable linear coordinate change. 
We show, as we did in the two-variable case, that a form is degenerate in this sense 
precisely when its associated matrix is noninvertible. 

Theorem 7.10. Let Q(x) = x' Mx be a quadratic form in n variables, where M is 
the symmetric matrix associated with Q. Suppose a linear coordinate change x = Lu 
can be chosen so that the variable u\ does not appear in the formula 

Q(u u . ..,u„) = Q( u) = Q(L\T) = u’L , 'MLu 


for the transformed quadratic form. Then M is noninvertible, and conversely. 


Proof. Let us first suppose M is noninvertible. Then there is a nonzero vector r 
in its kernel: Mr = 0. Choose additional vectors S 2 , ..., s„ so that the n vectors 
{r, S 2 ,..., s„} form a basis for R", and let L be the invertible matrix whose columns 
are the vectors r, S 2 ,..., s„, in that order. The variable u\ will be missing from 

0(u) = u'L^MLu 

if all the entries in the first row and the first column of the matrix Lf ML are zero. 

To show that Lf ML has this property, first write L and L ^ in the form 


L = (r S 2 ■ ■ • s„) and 



\s IJ 


Then matrix multiplication allows us to write the n x n matrix ML is a similar way, 
as 

ML = (Mr Ms 2 ■■■ Ms„) = (0 Ms 2 ■■■ Ms„). 

In that case, 


L^ ML = 

( V l) 

S 2 

(0 Ms 2 ■ 

■ Ms „) = 

/r f 0 

4 ° 

r^Ms 2 ■ 
s^Ms 2 ■ 

■ CATs K \ 

• s^A/s„ 





Uo 

sjA/s 2 • 

• s J t Ms„J 


Every entry in the first column is 0; but ZJ ML is symmetric, so every entry in the 
first row is 0, as well. Thus the first variable, u\, is everywhere missing from g(u). 
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To prove the converse, suppose that the y'th variable is missing from the expres¬ 
sion of a quadratic form. Then the yth row and yth column of the matrix associated 
with the form contain only zeros, implying the determinant of the matrix is zero and 
the matrix is noninvertible. □ 

Definition 7.6 A quadratic form Q is nondegenerate if its associated symmetric 
matrix is invertible, and is degenerate otherwise. 

Corollary 7.11 A quadratic form is nondegenerate if and only if the eigenvalues of 
its associated symmetric matrix are all nonzero. 

Proof. The determinant of a matrix equals the product of its eigenvalues, so the 
matrix is invertible if and only if all its eigenvalues are nonzero. □ 

There is more we must say about the eigenvalues of the symmetric matrix M 
of a quadratic form. Because we obtain eigenvalues as the roots of a polynomial, 
in general those eigenvalues are complex numbers, even when the entries of M are 
all real numbers. However, the eigenvalues associated with a quadratic form via its 
symmetric matrix are all real. 

Theorem 7.12. IfM is a symmetric nxn matrix with real entries, then all the eigen¬ 
values of M are real numbers. 

Proof. Let p{X) be the characteristic polynomial of M (Definition 2.1, p. 35); by 
the fundamental theorem of algebra, there are n (not necessarily distinct) complex 
numbers Ai,..., A„ that are the roots of p( A) = 0. For each distinct root A = a + i/3 
(with a and f) real), there is a complex eigenvector z = x + iy such that Mz = Az 
and z f 0. Each of these has a complex conjugate: 

A = a — i/3, z = x — iy. 

If A = A, then /3 = 0 so A = a, a real number. 

Thus, to prove the theorem, we show A = A; to do this, we calculate the matrix 
product z^Mz two ways. First, 

z.''Mz = z* (Xz) =A(z^z). 

In the second calculation, we use the fact that fvf =M = M, because M is symmetric 
and real, and we equate z'Mz and z z with their transposes because they are scalars: 

z'Mz = (z^Mz)^ = z'M^'z = t^Mz = z'Xz = A(z^z) = X{z''zrf = A(z*z). 

Thus A(z f z) = A(z T z), and because z f z = x'x + y^y = ||x|| 2 + ||y|| 2 > 0, we can 
divide by z’z and conclude A = A. □ 

Although the eigenvalues of a real symmetric matrix must be real, the eigenvec¬ 
tors need not be. For example, every nonzero complex vector is an eigenvector of the 
identity matrix (with real eigenvalue 1). However, we can show that, in a sense, the 
complex eigenvectors are superfluous: there is always a real eigenvector associated 
with each real eigenvalue of a real matrix, symmetric or otherwise. 
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The Hessian 


The effect of 
coordinate changes 


Theorem 7.13. Suppose z is a complex eigenvector of the real matrix M, associ¬ 
ated with the real eigenvalue A. Then the real and imaginary parts of z are (real) 
eigenvectors of M associated with A . 

Proof. Write z = x + iy; then A (x + iy) = Az = Mz = A/(x + iy). The real and imag¬ 
inary parts of this equation hold separately; because A and M are real, the real and 
imaginary are 

A x = Mx, Ay = My. □ 

Thus all the eigenvalues of a symmetric matric are real, and each distinct eigen¬ 
value has a corresponding real eigenvector. We are now ready to introduce the Hes¬ 
sian of a function of n variables and begin the local analysis of that function near a 
critical point. 

Definition 7.7 Suppose the function z = /(x) has continuous second derivatives on 
a neighborhood of a critical point x = a. The Hessian of f at a is the symmetric 
matrix of second derivatives 


'%(») ••• /i«(»)\ 

?4 = ; •. • 

\/nl(a) ••• fnn{ *)/ 

The Hessian form of f at a is the quadratic form associated with the Hessian. 

As we noted already in the two-variable case, continuity of the second derivatives 
guarantees that H a is symmetric. Moreover, we continue to use the symbol H a for 
the Hessian form as well; thus 

H a (x i,...,x n )= /n (a)xf + 2/i 2 (a)xix 2 H- \-f nn (f)x 2 n . 

Definition 7.8 Suppose the function z = f(x) has continuous second derivatives 
near the critical point a. Then a is nondegenerate if the Hessian H a of f at a is 
nondegenerate, and is degenerate otherwise. 

Our goal is to show that coordinate changes can put a function into a particularly 
simple form near a nondegenerate critical point. But we must ask: can a coordinate 
change eliminate a critical point, or can it convert a nondegenerate critical point into 
a degenerate one? We now show that criticality and nondegeneracy are geometric 
properties of functions, unaltered by coordinate changes. 

Theorem 7.14. Suppose the coordinate change x = h(u) transforms /(x) into g(u): 
/(x) = /(h(u)) = g(u). Then z = /(x) has a critical point at x = a = h(b) if and 
only if z = g(u) has a critical point at u = b. 

Proof. By the chain rule, dg b = 4/a ° dh b . Because dh b is invertible because h is a 
coordinate change, 

4?b = 0 +=+ d/ a = 0 . □ 
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Theorem 7.15. Suppose the coordinate change x = h(u) transforms /(x) into g(u), 
where f and g have continuous third derivatives. Then u = b is a nondegenerate 
critical point of z = g(u) if and only if x = a = h(b) is a nondegenerate critical 
point of z = /(x). 

Proof. We have /(x) = /(h(u)) = g(u). Let H a be the Hessian matrix of / at a, and 
let H b be the Hessian matrix of g at b; we must establish a connection between H a 
and //,” that implies one is invertible precisely when the other is. 

The Hessians appear in the respective Taylor expansions off and g: 

/(a + Ax) -/(a) = jAx 1 'f/ a Ax + 0((Ax) 3 ), 
g( b + Au) - g(b) = jAu'H* Au + 0((Au) 3 ). 

However, 

/(a + Ax) - /(a) = Az = g(b + Au) - g(b), 
so we can begin to connect the two Hessians by writing 

2Az = Ax^// a Ax + 0((Ax) 3 ) = Au 1 //^ Au + 0((Au) 3 ). 

Now express Ax in terms of Au by using the differentiability of h at b: 

Ax = x a = h(b +Au) — h(b) = dh b (Au) +o(Au) = ZAu + o(Au). 

For visual clarity we have set dh b = L here; the remainder is “little oh” of Au. By 
Exercise 3.28 (p. 104), ZAu = O(Au), so Ax = O(Au). 

For every Au f 0, write Au = ,v Ay with Ay a unit vector and a suitable s > 0. 
Then <9((Au) 3 ) = 0(s 3 ), 

Ax = sZAy + o(.s) = s(LAy + o(s)/s), 0((Ax) 3 ) = 0(s 3 ), 

and we can write the two expressions for 2Az as 

s 2 (LAy + o(s)/s) '//a (ZAy + o(s)/.s) +0(5 3 ) = s 2 Ay f // b Ay + 0(s 3 ). 

Now divide the equation by s 2 and take the limit as s —> 0, using o(s)/s —> 0 and 
0(s 3 )/s 2 —> 0. The result is 

(LAyfH a (My) = Ay + {L'H a L)Ay = Ay ^H* Ay 

for every Ay f 0. This implies L' H a L = ll' h and hence 

det// b = det H a (detZ) 2 . 

Because detZ = detdh b / 0 because h is a coordinate change, det/ I' h / 0 if and 
only if det Z/ a f 0. □ 

The equations dg b = df a o dh a and det = det H a (detdh b ) 2 in the last two 
proofs are the multivariable analogues of the earlier equations ^(0) = f(0)h'(0) 
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Morse theory and 
Morse’s lemma 


Survey of the proof 


Morse’s lemma, part 1: 
expanding by powers 


andg"(0) = f'{0)(h'(0)) 2 that showed criticality and nondegeneracy were geomet¬ 
ric properties of single-variable functions (p. 222). 

We are now ready to state and prove the main theorem. It first appears in an im¬ 
portant paper on the topological properties of multivariable functions that Marston 
Morse published in 1925 [13]. Because the theorem was just one of several technical 
facts he needed to establish the paper’s main results (now called Morse theory), it 
was natural for him to label this fact as a lemma. For us, however, the fact is central, 
though it is still always called Morse’s lemma : at a nondegenerate critical point, a 
function can always be converted into a sum of squares. 

Theorem 7.16 (Morse’s lemma). Suppose z — /(x) has continuous third deriva¬ 
tives on an open set X'\ the point x = a in X n is a nondegenerate critical point off, 
and the Hessian matrix H a has r negative eigenvalues. Then, in a sufficiently small 
window W a centered at a, there is a coordinate change Au = h(Ax) for which 

Az = /(a + Ax)-/(a) 

= — (Ami) 2 — ••• — (A u r ) 2 + (Am^+i) 2 + ■ ■ • + (Am„) 2 . 


Because the Hessian H a is symmetric and the critical point is nondegenerate, 
the eigenvalues of H a are all real and nonzero. If all are positive (i.e., r = 0 in 
the statement of the theorem), then there are no negative squares in the sum. If all 
eigenvalues are negative (i.e., r = ri), then there are no positive squares in the sum. 

The proof of Morse’s lemma breaks up naturally into three parts. In the first 
part (Theorem 7.18), a coordinate change reduces the window equation for a func¬ 
tion at an nondegenerate critical point into a simple quadratic form with variable 
coefficients. In the second part (Theorem 7.19), a further coordinate change “diago¬ 
nalizes” the quadratic form. This means that the form becomes a sum of positive and 
negative squares (and the symmetric matrix associated with it becomes a diagonal 
matrix). But significantly, it also means that the coefficients of the quadratic form 
become constants. In other words, any function “looks like” a sum of squares near 
a nondegenerate critical point. The third part of the proof of Morse’s lemma (Theo¬ 
rem 7.25) shows that the number of negative squares in the sum does not depend on 
the way the coordinate changes were chosen, but is always equal to the number of 
negative eigenvalues in the Hessian of the given function at its critical point. 

Morse begins the proof of the Morse lemma by expanding a function into linear 
and quadratic terms in a way that is uncannily similar to Taylor’s expansion. Tay¬ 
lor’s formula splits the function into three simple pieces—a constant, a linear form, 
and a quadratic form—plus a fourth piece that contains the remaining “complexity” 
of the function. Morse recasts the formula so there is no separate remainder; the 
coefficients of the quadratic form become variable, and contain all the complexity 
that Taylor’s formula puts into the remainder. We have already seen Morse’s expan¬ 
sion put to use: on pages 219-220 we used it to determine the local behavior of a 
function of one variable near a critical point. Here, then, for the sake of comparison 
are the theorems that provide the expansions of Taylor and Morse. 
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Theorem 7.17 (Taylor). Suppose z = f(x) has continuous third derivatives on an 
open set that contains the line segment from a to a + Ax. Then 


/(a + Ax)=/(a) +£ 


i=i 


df, 

dxi 


1 n d 2 f 

(a)A x t + - ^ — (a)Ax,-Ax y + 0((Ax) 3 ). 


□ 


Theorem 7.18 (Morse). Suppose z = /(x) has continuous third derivatives on an 
open set that contains the line segmentfrom a to a + Ax. Then there are continuously 
differentiable functions hij(Ax) = hp{ Ax) for which 


n 


/(a + Ax) =/( a) + X 

i=i 


df 

<9x, 


(a) Ax,+ 


X h ij {Ax)/Ax i /Ax j , 
97=1 


Proof. For clarity, we separate the proof into a number of steps. One of our aims is to 
provide explicit instructions for constructing the coefficients h,j (Ax) of the quadratic 
form. Note, in what follows, similarities with the proof of Taylor’s theorem. 

With the following lemma, we are able to build all the terms in Morse’s formula, 
including the crucial coefficients hjj. 

Lemma 7.3. Suppose z = F[x) has continuous derivatives of order k+ 1 on an open 
set that contains the line segment from a to a + Ax. Then there are functions Pi(Ax) 
with continuous derivatives of order k for which 


F{a + Ax) = F(a) + £ Pi(Ax) Ax,-, 
;=i 


Step 1 


dF 

andpi{ 0) = — (a), i = 1 

C/Xj 

Proof We express the difference Ac = F(a + Ax) — F(a) as an integral, as in the 
beginning of the proof of Taylor’s theorem for a single-variable function (cf. p. 79): 


d 

/ —F(a + tAx)dt=F(a + tAx) 
Jo dt 


= F( a + Ax) — F(a) = Az. 


In this multivariable setting, the chain rule gives us 

d ” dF 

F(a + tAx) = ^ —- (a + 1Ax) A Xj, 


dt 


i=i dxi 


SO 


Az : 


~1 \J o dxt 


(a + tAx)dt ) Ax,. 
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Step 2 


Step 3 


Step 4 


Therefore we take 


r l dF 

Pi(Ax) = / -r— (a + tAx)dt. 

Jo axi 

Because dF/dxt has continuous derivatives of order k, so does p,. Moreover, 

f 1 dF dF dF 

Pi («)=/ — (a + tO)Jt= — (a)/ dt=—( a). 

Jo dxi dx{ Jo dxj 


□ 


Now apply Lemma 7.3 to the function / itself to obtain functions g,(Ax) for 
which 

/(a + Ax) = /(a) + ^ g/(Ax) Ax,-. 

i— 1 

According to the same lemma, each function g, has continuous second derivatives, 
and 

which gives us a start on Morse’s expansion. 

Apply Lemma 7.3 again to each g,(Ax), i = 1this time taking a = 0. We 
get functions A,y( Ax), j = 1 ,...,with continuous first derivatives for which 

g,-(Ax) = g,(0) + £ %(Ax)Axy = |^(a) + ^ %(Ax) Ax y , 
y=i ® x ' 7=1 

and hij(0 ) = ^-(0). 

OX, j 

Comment: Nominally, eachg, is a function of the window variables Ax y , but because 
Ax y and xj differ merely by a constant (Axj = xj — aj), the differential operators 

d , d 

d(Axj) 3n dxj 

have the same action. For simplicity we therefore write 

instead of Sgl 


dx 


■j d(Axj) 


here and in all the following work. 

Now substitute the expression forg,-(Ax) into the formula for /(a + Ax) in Step 2: 

/(a + Ax)=/(a) + ^!^(a)Ax/ + X XMAx)Ax/A*7. 


i= 1 


<=17=1 


This looks like Morse’s expansion; in particular, the last term is a quadratic form 
with variable coefficients hjj(Ax). But nothing in Lemma 7.3 ensures that hp( Ax) = 
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hij{ Ax) for every i,j = 1as required by the theorem. (In other words, the 
matrix //(Ax) = (Ay (Ax)) that defines the quadratic form need not be symmetric.) 

But we can use Lemma 7.2 to replace the matrix H by the symmetric matrix 
H = (H + W)/ 2 without altering the quadratic form. That is, if we let 


Ay (Ax) = 


Ay (Ax) + hji( Ax) 


then 


/(a + Ax) =/(a) + £ ^p-(a)Ax, + £ £ Ay(Ax)Ax,Ax/ 
i=i i=i y=i 


and hjj(Ax) = Ay (Ax) for all i,j= 


In remains only to verify that Ay (0) = — 


1 d 2 / 


,JK ! ~ Idxtdx, 


(a). We claim that, in fact, 


*'' (0 H^ (a) 4d^ (a) = *'' <0) - 


To prove the claim, note (Step 3) that Ay(0) = -— (0). Therefore, because 

dxj 

/*! df 

gi{ Ax)=/ — (a + ?Ax)/t, 

Jo dxi 


we can link Ay to / by calculating the appropriate partial derivative of g,-. This 
involves differentiation under the integral sign, a delicate matter but one that is al¬ 
lowed here because the integrand is continuously differentiable; see an introductory 
text on real analysis. We have (by the chain rule) 


dg, 

dxj 



77 (a + ,4x) ) 


dt = 


/' 


d 2 f 

dxj dxj 


(a + /Ax) t dt , 


from which it follows that 


Mo) = 


>-J. 


_££_ 

dxj dxi 

d 2 f 


(a) t dt 


dxj dx, 


-(a) /' 

i Jo 


d2f 


2 dxjdx, 


■(a). 


This completes the proof of Theorem 7.18, and incidentally shows that, even though 
the matrix //(Ax) may not be symmetric in general, at least it is when Ax = 0. □ 

Let us see how the instructions provided in this proof (in Steps 2 and 3) give us 
formulas for the functions Ay at a critical point of the “tipped wine bottle” function 


Step 5 


Step 6 


Example: 
constructing the Ay 
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f m (x,y) = (x 2 +y 2 — 1 ) 2 + mx (Chapter 7.2, pages 231-237). These are the formulas 
for A, B, and C that appear on page 234. 

To find the hjj, we must first construct the g„ and these involve partial derivatives 
of the expression f m (x) (evaluated at x = a + /Ax). But the window function 

Az = fm(x) - f m ( a) 

has the same partial derivatives; the two functions merely differ by a constant. Fur¬ 
thermore, when we restrict a to the form (p,0) (because we are interested only in 
critical points), we can use the expression for Az we have already computed on 
page 234: 


Az = (6 p 2 — 2) (Ax) 2 + (2 p 2 — 2) (Ay) 2 

+ 4 p (Ax) 3 + Ap Ax (Ay) 2 + (Ax) 4 + 2(Ax) 2 (Ay) 2 + (Ay) 4 . 

Finally, keeping in mind the comment in Step 3 of the last proof, that derivatives 
with respect to Ax,- and x, are interchangeable, we can now compute 

( \ = 2(6 p 2 - 2)Ax + \2p{Ajc) 2 + Ap(Ay) 2 + 4(Ax) 3 + 4Ax (Ay) 2 , 
d (Ax) 

= 2(2p 2 -2)Ay+8 J pA.vAy + 4(Ax) 2 Ay + 4(Ay) 3 . 

<9 (Ay) 

Thus, 

r 1 d(Az) 

gl (Ax, Ay) = J ^ ^ (/Ax,/Ay) dt 

= J {2t(6p 2 — 2)Ax + 4/ 2 [3_p(Ax) 2 + />(Ay) 2 ] +4/ 3 [(Ax) 3 +Ax(Ay) 2 ] } dt 

= (6 p 2 — 2)Ax + Ap(Ax) 2 + ^p{Av) 2 + (Ax) 3 + Ax (Ay) 2 . 

In a similar way, 

r 1 d(Az) 

g 2 (Ax, Ay) = J ^ (/Ax, /Ay) dt 

= (2 p 2 - 2)Ay + f^AxAy + (Ax) 2 Ay + (Ay) 3 . 

We are now ready to compute the four functions hjj. By definition, h\\ = h\\, so 

/*! ^p-i 

h ii (Ax, Ay) = / (tlix,tky)dt 

J o o (Ax) 

= J {[6/7 2 — 2] + f[8/7Ax] + £ 2 [3(Ax) 2 + (Ay) 2 ]}^ 

= (6/7" — 2) + 4/7 Ax + (Ax) 2 + ^ (Ay) 2 . 
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(This is the functional(Ax, Ay) given on page 234). Next, notice that 


Ss ' =l P Ay+2AxAy= ds 2 


d (Ay) 


d (Ax)’ 


implying h \2 = A 21 = hn- We have 

/T ^p-i 

hn(Ax,Ay)= ( tAx,tAy)dt 

Jo o(Ay) 

~ J {/ [§,pAy] + ?2 [ 2 AxAy]} dt = j/?Ay + |AxAy 
= B(Ax, Ay). 

Finally, because A 22 = A 22 , we have 

ti 22 (Ax,Ay) = [ (tAx,tAy)dt 

Jo o'(Ay) 

= J {[2p 2 -2\+t[^pAx\ +t 2 [(Ax ) 2 + 3(Ay) 2 ] } dt 
= (2p 2 — 2) + ^pAx+ | (Ax ) 2 + (Ay) 2 . 

Because this is the function C(Ax, Ay) given earlier, we have completed the example. 

We now move on to the next part of the proof of Morse’s lemma. We can assume, 
by Theorem 7.18, that our function is already written in window coordinates as a 
quadratic form with variable coefficients: 

n 

Az = /(a + Ax) — /(a) = ^ A;y(Ax)Ax,-Ax/, 
ij=1 

where Ay (Ax) = A,-y(Ax) and 


Mo) 


1 d 2 f 

2 dujdiij 


(a). 


Our goal is to “diagonalize” this quadratic form. If the coefficients were constants 
instead of functions, then linear algebra would provide a standard diagonalization 
method that involves changing coordinates, one variable at a time, by “completing 
the square.” We actually use this method because, as Morse pointed out, it works 
just as well with variable coefficients. 

The first step in completing the square is to divide by the leading coefficient 
(this is An in the quadratic form we are dealing with, and was A in the example we 
worked through on pages 234-237); therefore that coefficient must be nonzero. Of 
course, we have no reason a priori to expect An ^ 0. Even in the simple example 


Morse’s lemma, part 2: 
diagonalizing the 
quadratic form 


The leading coefficient 


Q{ Axi ,Ax 2 ) = 2Axi Ax 2 , 
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Making the leading 
coefficient nonzero 


the leading coefficient is zero (h\\ = h 22 = 0, h\ 2 = h 2 \ = 1)- In this case, though, 
we can fix the problem with an obvious coordinate change: 

Axi = A_vi - Av 2 , Ax 2 = Ayi + Ay 2 ■ 


Then 

Q(Ax h Ax 2 ) = 2Axi Ax 2 = (Ayi) 2 - (AV 2) 2 = Q*(Ay h Ay 2 ), 

so the coefficients of the form Q* that results from the coordinate change are h* n = 1, 
h\ 2 = h* lx = 0, h* 22 = — 1. In fact, the following lemma says we can always make 
the leading coefficient nonzero, at least if the form is nondegenerate. The lemma 
concerns a quadratic form with variable coefficients 

n 

Q(Ax) = ^ hij(Ax)AxjAxj, hji(Ax) = Ay(Ax). 

i,j= 1 

Lemma 7.4. Suppose the matrix hij( 0) is invertible. Then there is a linear coordi¬ 
nate change Ax = L(Ay) for which 


n 


£?(L(Ay)) = Q* (Ay) = £ A?/Ay)Ay,A^, 
ij= l 


with h\ j(0) f 0. 

Proof. Let //(Ax) be the symmetric matrix with entry hjj( Ax) in the zth row and 
y'th column. Suppose first that one of the diagonal elements of //(0) is nonzero, say 
hjj( 0) f 0. Define 

L:Axi=A yj, Axj = Ay\, Ax k = Ay k , kf\J. 

This is a transposition permutation and is its own inverse (and if J = 1, it is the 
identity). In terms of the new variables, /ijj(Ay) = hjj{ Ax), so Ajj(0) f 0, and we 
are done. 

The alternative is that all diagonal elements of //(0) are zero. In that case, some 
other element Ai,(0), j = 2.....n in the first row of //(0) must be nonzero. Other¬ 
wise, det//(0) = 0, which is contrary to hypothesis. So suppose h i_/(0) = hj\ (0) f 0, 
where J f 1. Define 


L : Ax! = Ayi - A yj, Axj = Ayi + A yj, Ax k = Ay k , kf\,J. 

This L is also invertible; in fact, it is a rotation-dilation of the (Axi, Ax/)-plane. 
To determine /^[(Ay), we need to determine all places where (Avi) 2 appears as a 
quadratic factor in the form Q* (Ay) = Q(Ax). There are three such places: in (Axi) 2 , 
in (Ax/) 2 , and in Axi Axj. We find 

h \j (Ay) =h n (Ax) + hjj(Ax) + 2h v (Ax ), 


and thus 
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*n(0)=A 11 (0)+M0) + 2M0)- 

The first two terms are diagonal elements of H( 0) and hence, by assumption, are 
zero; the remaining term gives /zjj(0) = 2hu(0) fO. □ 

The following theorem carries out the diagonalization process that gives Az as 
a sum of positive and negative squares. It is the heart of Morse’s lemma but does 
not complete the proof because it does not determine how many of the squares are 
positive and how many are negative. 

Theorem 7.19. Suppose z = fix) has continuous third derivatives on an open 
set X", and the point a in X 71 is a nondegenerate critical point of f. Then, in a suffi¬ 
ciently small window W a centered at a, there is a coordinate change Au = h(Ax) so 
that 

Az = /(a +Ax) —/(a) = ±(Azzi) 2 ± ■ ■ ■ ± (A u„) 2 . 

Proof. We can assume by Theorem 7.18 (p. 249) that we have already written Az as 
a quadratic form with variable coefficients: 

n 

Az = Y hij(Ax)Axj Axj, 
ij= 1 


where A,;(Ax) = /z,y(Ax) and 


M°) 


1 d 2 f 

2 dxi dxj 


(a). 


Moreover, because / has continuous third derivatives, the same theorem tells us that 
the coefficients /z,y(Ax) have continuous first derivatives. 

This proof also goes in stages; at each stage, a coordinate change “splits off’ one 
more variable as a perfect square. In other words, we claim that after k stages, the 
window equation will look like 


Az=±(Avi) 2 ±-"±(Avi) 2 + Y h*j(A\)AviAvj, 

ij=k +1 


and that the new coefficients h*j are rational functions of the coefficients from the 
previous stage. Because each stage is like every other, the proof is a mathematical 
induction. Thus, we assume that we have already reached the stage where k = M— 1 
squares have been “split off’ from the quadratic form, and deduce that the next 
stage, k = M, also holds. (The initial step in the induction is just the one where 
M = 1, so we do not need to prove it separately.) 

Thus we focus on the residual quadratic form 

Qm{ Av)= X h*j(Av) Av/Avy. 

iJ=M 

Note that this is a quadratic form in just the variables Av^, ..., Av„, although the 
coefficients h*j remain functions of all the variables Av = (Avi,... ,Av„). The lead- 


Reducing the function 
to a sum of squares 


Proof by induction 
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Completing the square 


The new residual form 


The coordinate change 


ing coefficient is h*MM’ and we can assume (by Lemma 7.4) that W0)^0. If we 
separate out all appearances of A vm as a quadratic factor, we get 


Qm {Ay) = h* MM { Av) 


((Av m ) 2 + 2 Av w E 


y=M+t 


^mm(Av) j 


+ E h* ij {Ay)Av i Av j . 

i,j=M +1 


Completing the square then gives us 


2m (Av) = h* MM (A\) ^Av m - 


^y(Av) 


- h* MM {Ay) E 


j=M+l n MM 

h*Mj( Av) 


(Av) 


Av; 


,j=M+l MM 


(Av) 


Av,- 


E h* j {Ay)Av i Av J . 


i,j=M +1 


Together, the terms on the second line constitute a new quadratic form in the vari¬ 
ables Avm + i ,..., Av„ alone, but with variable coefficients that still depend on all the 
Av,, in general. We write that new form as 


n 

Qm+ l (Av) = E % ( Av ) A v <‘ Av/ ■ 

iJ=M +1 


The formulas show that the new coefficients hjj are rational functions of the h*j 

(in which only h* MM appears in the denominator). Therefore, the new hjj have con¬ 
tinuous first derivatives wherever the denominator h* MM { Av) does not vanish. This 
confirms the assertion about the coefficients that is part of the induction. 
Completion of the square leads us to the coordinate change 

(aw m = y/\h* MM (Ay)\ (a v M + E ff '^1 Av j] > 
h M ■ < \ j=M +1 %m( A v ) / 

[ Aw/ = Av;, 
that transforms 2 m into 


2m(Av) = ±(Awm) 2 + 2m+i (Aw). 

The new coordinates split off one more variable as a perfect square, and leave a new 
residue 2m+i that is again a quadratic form, but with one less quadratic variable. 
The coefficients of the new residual form are continuously differentiable functions 
of the new coordinates. 

The induction is therefore completed as soon as we prove that the map h m is a 
valid coordinate change. We use the inverse function theorem. First, note that the 
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components of h m have continuous first derivatives near the origin, because they 
are rational functions of ^\h* MM {A\)\ and h* M j{ Av), j =M+ 1,... ,n. Next, 


/I • 

• 0 

0 

0 

0 ^ 

0 • 

• 1 

0 

0 

h*M,M+ 1(0) 

0 

0) 

0 • 

• 0 

V ^W(o) 


V 

0 • 

• 0 

0 

1 

0 

Vo • 

• 0 

0 

0 

1 / 


so detr/(hAf)o = i/|A^ M (0)| ^ 0 and the linear map r/(h J vr)o is invertible. The inverse 
function theorem then implies that h m itself is invertible on some window Wm cen¬ 
tered at a, and the inverse is continuously differentiable on the image Um = h*/ ( Wm) ■ 

The coordinate change Au = h(Ax) that will carry out the entire diagonalization 
is the composite h = h„ o • • • o hi that carries out the individual changes, one after 
another. The proof makes it reasonably clear that the composite is well defined, that 
is, that we can always carry out the next coordinate change in the sequence. Alter¬ 
natively, note that each successive pair h !+ i o h, of changes is defined on the open 
set UiDWi+i, which is certainly nonempty because it contains 0 = h,-(0). Finally, by 
the chain rule, the composite h is continuously differentiable. □ 

We now come to the final part of the proof of Morse’s lemma, where we show 
that the number of negative squares in the new formula 


Az = ±(Ami)“ i • • • i ( Au n )“ 

for / is equal to the number of negative eigenvalues in the Flessian of / at x = a. In 
particular, it follows that this number does not depend on the choices we made in 
constructing the coordinate change Au = h(Ax). 

In terms of the new coordinates, Az is a particularly simple quadratic form. For 
clarity, let us assume there are 5 negative squares and the coordinates have been 
rearranged so all the negative squares come first in the new formula; then 


Az = Q k { Au) = Au'A'Au. 


where the symmetric matrix K representing the form is 



Morse’s lemma, part 3: 
role of the Hessian 


and the off-diagonal entries are all zero. We can also write Az as the Taylor expan¬ 
sion 

Az= i// a (Ax) + <9((Ax) 3 ) = fAx t // a Ax + 0((Ax) 3 ), 
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Different coordinate 
representations of 
a quadratic form 


The index is a 
geometric property 


in which H a is the Hessian of / at the critical point in terms of the original x coor¬ 
dinates. The next theorem provides the first link between the matrices K and H a . 

Theorem 7.20. Let Au = h(Ax) be the coordinate change of Theorem 7.19, and let 
L = dh 0 be the derivative of h at Ax = 0/ then 

L}KL = \H a . 

Proof. The proof is left as an exercise; it is similar to the earlier proof (Theo¬ 
rem 7.15, p. 247) that connects the Hessians of equivalent functions at correspond¬ 
ing critical points. □ 

Corollary 7.21 The matrices \H a and K represent the same quadratic form in the 
coordinates Ax and Au, respectively. 

Proof. Let L = dho, as in Theorem 7.20; then the linear coordinate change Au = LAx 
converts Qk{A u) = Au : KAu into 

2a:(Ax) = Qk{LA\) = (LAx)^ K(LAx) = x^L^KLAx = jAx^H a Ax. □ 

The corollary leads us to regard a quadratic form as a fixed geometric object that 
has different representations in different coordinate systems. In other words, if we 
give a “geometric” vector v coordinates in two different ways, 

Ax <—» v * —» Au, 

then there is a function Q (the “geometric” quadratic form) defined on such vectors 
for which 

Qk( Ax) = Q(v) = 2a: (Au). 

What properties do Qk and Qk have in common? These are the geometric properties 
of the underlying function Q. 

Definition 7.9 We say the quadratic form Q is negative definite (respectively, pos¬ 
itive definite) on a set S in R" if Q(v) < 0 (respectively, Q(v) > 0) for every v f 0 
in S. 

Definition 7.10 The index of the quadratic form Q is the maximum dimension of a 
subspace N o/R" on which Q is negative definite. 

The index of a quadratic form is defined without reference to its representation in a 
particular coordinate system, so it is a geometric property of the form. 

Theorem 7.22. The index of the quadratic form 

2a: (Au) = - (Ami) 2 -(A u s ) 2 + {Au s+ i) 2 + ■■■ (A u„) 2 


is equal to s. 
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Proof. Let N s be the ^-dimensional linear subspace of vectors of the form 
Au = (A«i,... ,Am s ,0, ... ,0). 


Then 

Qk{ Au) = -(A Mi ) 2 - (Aih) 2 < 0 

for every nonzero Au in N s , implying that the index of Qk is at least equal to s. 

We are done if we show that the index of Qk is at most equal to s. First note 
that, by a similar argument, Qk is positive definite on the (n — s)-dimensional linear 
subspace P n ~ s of vectors of the form 

Au= (0,..., 0, Az<s + i,... ,Au n ). 


Now suppose the index of Qk were greater than s. Then there would be a linear 
subspace N s+1 of dimension s+1 on which Qk were negative definite. But then 
the intersection P”~ s CiN s+l would be a linear subspace of dimension at least 1, 
and would thus contain nonzero vectors. The value of Qk on such a vector would 
be both positive and negative, an absurdity we attribute to the assumption that the 
index could be greater than 5. We reject the assumption and conclude that the index 
is exactly 5 . □ 

By definition, the index of a quadratic form is independent of the coordinate 
representation. Therefore, because Qk and Qk represent the same form in different 
coordinates, Qk and Qk must have the same index. Consequently, the Hessian form 

Qk( Ax) = jAx t // a Ax 

must have index s. It remains to show that s equals the number of negative eigenval¬ 
ues of H a . This involves “transforming H a to principal axes” (cf. p. 242): using an 
^-dimensional rotation to reduce the Hessian form Qk to a sum of squares in which 
the eigenvalues of H a appear as coefficients. 

Definition 7.11 An n x n invertible matrix P is orthogonal if its transpose equals 
its inverse: P ^ = P 1 . 


Qk and Q K both 
have index r 


An orthogonal matrix gets its name from the fact that its columns are mutually 
orthogonal unit vectors. That is, if we write 


pt = 



P= (wi 


w„) 


then the condition P^P = 1 implies ||w ; j| 2 = w- w,- = 1 for every i = 1and 
wj w j = 0 for every i f j. 

Let ei A • • ■ A e„ be the unit n-cube whose edges are the standard basis vectors 
in M". Then 
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Rotations and 
orthogonal matrices 


The theorem stated 
in terms of matrices 


_P(ei A • • • A e„) = P(ei) A • • • AP(e„) = wi A • • • A w„, 


so (Definition 2.5, p. 46) volwi A • • • A w„ = detP = ±1 for every orthogonal ma¬ 
trix P. 


Definition 7.12 An orthogonal matrix P is a rotation if det/ J = +1. 

Thus, a rotation is an orthogonal matrix that preserves orientation. If P is orthogonal 
but det P = — 1, P can be converted into a rotation by changing the signs of the 
entries in any one of its columns. 

Theorem 7.23 (Principal axes theorem). If Q(x) = x' M x is a quadratic form in 
n variables, then there is a rotation x = R u o/M" that transforms Q into a sum of 
squares 

Q(Ru) = Q* (u) = /liMj H-I-A „ul, 

where Ai,..., A„ are the eigenvalues of M. 

Proof. The theorem asserts that, for any n x n symmetric matrix M. there is a rota¬ 
tion R for which 

R~ 1 MR = R t MR = D 

is a diagonal matrix whose diagonal elements are the eigenvalues of M. We prove 
the theorem in this form by using mathematical induction on n. 

If n = 1 there is nothing to do; we can take R to be the lxl identity matrix. 
Now assume that any (n — 1) x (n — 1) symmetric matrix can be diagonalized by a 
suitable rotation on R" -1 , and consider an n x n symmetric matrix M. 

Let ui be an eigenvector of M, Mn\ = AiUi, and take ui to be a unit vector. 
Extend ui to an orthonormal basis {ui,wi,...,w„_i} ofM". We may assume (by 
changing the sign of w„_i, if necessary) that the «-cube uj A wi A ■ ■ ■ A w„_i has 
positive orientation, and thus that the matrix 


r\MR x = 



Ri 

= (Ui Wl • 

A/ui 

Mwi 

••• 

/ Aiujuj 

Ai wj ui 

ujMwi 

w{Mwi 

\Aiw„Li u i 


/Ai 

0 

0 

m \\ 

0 

m T,«— i 

\o 

m *n- 1,1 

m h-\,n- 


w, 


»- 0 




Mv/n-l) 


m*j = w^A/W;. 
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The zeros appear in the first column because every w,- _L m, and in the first row 
because r\mR\ is symmetric. Also, 

m*j = w] Mwj = (wjMw/)t = wj-Mw, = mT, 

so M* = ( m*j ) is an (n — 1) x (n — 1) symmetric matrix. 

By the induction hypothesis, there is a rotation R* that diagonalizes M* ; that is, 
(R*yM*R* = D*. Let 



this is the n x n matrix with 1 and R* on the diagonal, and with all off-diagonal 
elenents not shown equal to zero. Then 

*5 = 0 (jr)t) and s 5* 2 = 0 <«*)»*.) = (' 

the n x n identity matrix, thus Rj is orthogonal. Moreover, det/sS = 1 x dct R* = 1, 
so A *2 is a rotation. 

Lemma 7.5. The matrix R = RiR i is a rotation and diagonalizes M. 

Proof. We know R is a rotation because it is a product of rotations. Moreover, 

*’“ = *!*■!*'*,* = *5 O' «>=(' (s*)t) (*' «*)(' r) 

= Ai i = Ai i. 

0 ) \ D*)’ 

this is an n x n diagonal matrix. □ 

Lemma 7.6. If M is symmetric, P is orthogonal, and R : MP is a diagonal matrix 
with diagonal elements <X\, i = 1 then a, is an eigenvalue of M and the i-th 
column of P is a corresponding eigenvector. 

Proof. This is the ^-dimensional version of Theorem 7.7, page 241, and has a sim¬ 
ilar proof. The key is that P^MP = D implies MP = PD because P f = P 1 ; see the 
exercises. □ 

Thus the diagonal elements in the diagonal matrix of Lemma 7.5 are the eigenvalues 
of M, and the proof of the principal axes theorem is complete. □ 

Our proof of the principal axes theorem indicates that the eigenvectors of a sym¬ 
metric matrix have properties not shared by matrices in general. Before returning to 
the analysis of critical points, we pause to establish some of those properties. 


Interlude 


Definition 7.13 The eigenvalue a of the matrix M has multiplicity k if the the factor 
A — a appears k times in a factorization of the characteristic polynomial of M. 
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Index of inertia 


Although each distinct real eigenvalue of an n x n real matrix M has a real eigen¬ 
vector associated with it (Theorem 7.13, p. 246), in general a repeated eigenvalue 
may not possess additional linearly independent eigenvectors. Consequently, the 
eigenvectors of such an nx n matrix may not span R". The first such examples we 
saw were the shears M-] and M% of Chapter 2 (pp. 38ff). However, a symmetric 
matrix has a “complete” set of eigenvectors. 

Corollary 7.24 Suppose M is a symmetric matrix with an eigenvalue a of multiplic¬ 
ity k. Then the eigenvectors associated with a form a k-dimensional subspace E a 
ofW. 


Proof. Sums and scalar multiples of eigenvectors associated with a are again eigen¬ 
vectors associated with a , so they form a subspace E a of R". Because a has multi¬ 
plicity k, precisely k columns of the orthogonal matrix P of Lemma 7.6 are eigenvec¬ 
tors associated with a. Those columns are linearly independent, so the dimension 
of E a is at least k. 

We must now show the dimension of E a is not greater than k. Let Vi,..., \ n be 
the columns of P; assume, by rearranging them if necessary, that the first k columns, 
vi, ..., Vi, are the eigenvectors associated with a. Suppose w is in E a ; because 
{vi,..., v„} is an orthonormal basis, we can write 


w = vivH-bv„v„, 

where vy = w t vy = vj-w by orthonormality of the basis. Let a, be the eigenvector 
associated with vy; then ay = a if and only if j = 1..... L We have 

ayVy = OyW 1 Vy = W t Mvy = (w t Mvy) 1 ' = vj-Mw = avj-W = (XVj\ 

Mw = aw because w is in E a . Thus (ay — a)vy = 0. This forces vy = 0 for j > k, 
implying that {vi,..., v*} spans E a . Hence dim E a = k. □ 

The subspace E a is sometimes called the eigenspace associated with a. Thus, for 
a symmetric matrix, the dimension of the eigenspace associated with an eigenvalue 
equals the multiplicity of that eigenvalue. 

To complete the third part of the proof of Morse’s lemma, we must show that, 
whenever a coordinate change reduces Az to a sum of squares, the number of nega¬ 
tive squares in that sum is always equal to the number of negative eigenvalues of the 
Hessian matrix H a . For a quadratic form with constant coefficients under a linear 
coordinate change, the invariance of the number of negative squares and positive 
squares was shown by J. J. Sylvester in 1852 [18]. He characterized the result as 
“... a law to which my view of the physical meaning of quantity of matter inclines 
me, upon the ground of analogy, to give the name of the Law of Inertia for Quadratic 
Forms, as expressing the fact of the existence of an invariable number inseparably 
attached to such forms.” 

Sometimes, to underscore the invariant nature of the index of a quadratic form, 
we add Sylvester’s term and call it the index of inertia. 
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Theorem 7.25. Suppose z = /(x) has continuous third derivatives on an open set 
X n , x = a is a nondegenerate critical point off in X n , and the Hessian matrix H a of 
f at ‘a has r negative eigenvalues. If An = h(Ax) is a coordinate change in a window 
centered at a that reduces Az to a sum of squares, 

Az = — (A«i ) 2 -(Am ,) 2 + (A u s+x ) 2 -\ -b (Am,,) 2 , 


then s = r. 

Proof. By Theorem 7.20 (p. 258), the linear map Au = dho(Ax) converts the 
quadratic fonn 

Az = g(Au) = -(Am ^ 2 - {Au s ) 2 + (Au s+ i) 2 H - b (A u„) 2 


into 

Az = Q(A\) = jAx^i/a Ax. 

Therefore, Q and Q are just different coordinate representations of the same (ge¬ 
ometric) quadratic form Q, and must therefore have the same index. By Theo¬ 
rem 7.22, the index of Q is 5 . By transforming Q to principal axes (Theorem 7.23), 
we see the index of Q is r. Thus s = r. □ 

This completes the third part of the proof of Morse’s lemma, and thus completes 
the entire proof. □ 

One of the consequences of Morse’s lemma is that the second derivatives of a 
function at a nondegenerate critical point determine the type of that point. In fact, 
the type is completely characterized by a single number: the index of inertia of its 
Hessian form. This leads to the following definition and theorem. 

Definition 7.14 The index, or index of inertia, of a nondegenerate critical point of 
a function is the index of its Hessian, that is, the number of negative eigenvalues of 
the Hessian matrix at that point. 

Theorem 7.26 (Second derivative test). Suppose x = a is a nondegenerate critical 
point of a function z = /(x) that possesses continuous third derivatives. If r is the 
index of a, then 

• a is a local minimum if r = 0. 

• a is a local maximum if r = n. 

• a is a saddle ifO < r < n. □ 

Morse’s lemma and the second derivative test classify nondegenerate critical 
points: there are n + 1 classes, one for each possible index. Two critical points are in 
the same class if a coordinate change will transform one into the other. For degener¬ 
ate critical points, the situation is very different. There are infinitely many classes, 
and no complete classification exists, although there are partial results. The analysis 
of (degenerate) critical points is part of the larger study of singularities of mappings , 
an active area of current research. 


Proof is complete 


Index of a 
critical point 


Classifying 
critical points 
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Nonisolated critical 
points are degenerate 
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second derivatives 


One useful observation we can make is that a nonisolated critical point—for ex¬ 
ample, a point on the ring of minima of the wine bottle—is necessarily degenerate. 
The proof (by Morse) is a nice application of the inverse function theorem. 

Theorem 7.27. Suppose x = a is a nondegenerate critical point of a function z = 
/(x) that has continuous second derivatives in some neighborhood X 71 of a. Then a 
is isolated, in the sense that there is some nonempty open ball B e centered at a that 
contains no other critical point of f. 

Proof. The gradient of / defines a map V/: X n —> R' ! , 


V/: 


df , n 

Ml = — (x), 
dx\ 


_ df 

u n — ^ ( x )i 

OXyi 


that is continuously differentiable, because / is twice continuously differentiable. 

By construction, a point b is a critical point of / if and only if V/(b) = 0; in par¬ 
ticular, V/( a) = 0. Furthermore, the matrix of the derivative d(Vf) a coincides with 
the Hessian matrix H a , so the nondegeneracy of a implies that d(Vf) a is invertible. 
The inverse function theorem then implies that the map V f itself is invertible on 
some open ball B e centered at a. In particular, V/ is 1-1 there, so no point b / a is 
mapped to 0. That is, no point b / a in B e is a critical point of /. □ 

Earlier (see pp. 222-224), we observed that the value of the second derivative 
at a regular point of a single-variable function could be transformed into any new 
value whatsoever by a suitable coordinate change. At a critical point, this degree of 
volatility does not occur: the sign of the second derivative cannot be changed. In 
effect, the convexity of a function graph is a geometric invariant at a critical point 
but not at a regular point. There is a similar distinction between the regular and the 
critical points of a function of several variables. For suppose x = a is a regular point 
of z = /(x). By the implicit function theorem (in particular, Corollary 6.8, p. 198), 
local coordinates (Ami, .• • ,A u n ) can be chosen near a so that Az = Am,,. Thus, in 
terms of the new variables, the function is linear, and all of its second derivatives 
are identically zero. Whatever information we thought might be conveyed by the 
original derivatives d 2 f /<9x, dxj has vanished with the coordinate change. 

By contrast, suppose x = a is a critical point of z = /(x). When a is nonde¬ 
generate, Morse’s lemma tells us the index of inertia of the Hessian of / at a is a 
geometric invariant. That is, if x = h(u) is a local coordinate change near a = h(b) 
that transforms / into g(u) = /(h(u)) = /(x), then 

• u = b is a nondegenerate critical point of z = g(u). 

• The index of inertia of the Hessian of g at b equals the index of inertia of / at a. 

If a is degenerate, then the rank of the Hessian is not maximal. In this case, though, 
an extension of our methods can show that the rank and the index of inertia are both 
geometric invariants. 
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Exercises 


7.1. Suppose y = f(x) has a continuous second derivative and f (0) f 0. 

a. For any value of B, the function x = hs{u) =u+Bu 2 is a coordinate change 
near the origin; explain why. 

b. Letg(w) = f(hg(u)) represent / under the coordinate change. Show that B 
can be chosen so that ^"(O) = A, where A is an arbitrary number. Write a 
formula that expresses how B depends on A. 


7.2. 


Construct the symmetric matric that corresponds to each of the following 
quadratic forms. 


a. Q(x,y) = 5x 2 + 18xv — 2y 2 . 

b. Q(x,y) =xy-x 2 -y 2 


c. Q(x,y) = (2x-y)(2y-x); 

d. Q(x.y) = (x y) Q. 


7.3. Show that, when m is small, the roots of 4x 3 — 4x + m = 0 (cf. p. 232) are 
approximately m/4 and ±1 — m/%. Specifically, you can show that 


df m 

dx 


(± 1-^,0 )= 0 ( m 2 ), 


dfm 

dx 


(j,0) = 0(m 3 ). 


7.4. Verify that the derivative dh( 0 ,o) of the coordinate change map given on 
page 234 has the form shown on page 235. 

7.5. Construct the symmetric matric that corresponds to each of the following 
quadratic forms. 

a. e(w) = 10» —2y, + =. d. e(x„...,*,) = i hf + j)*,*,. 

b. Q(x,y,z) = (x-y + z)(x+y-z). /= 1 j= l 

n n n 

c. Q{xi,...,x n ) = ^(i-5)x 2 . e. Q(xi,...,x n ) = ^ ^(i-j)xiXj. 

i= 1 j=l j=l 


7.6. Let f(x,y) = 3x 2 —x 3 —y 2 . 


a. Verify that (0,0) and (2,0) are critical points of /. 

b. Find the second-order Taylor polynomial for / at (2,0); call it P(x 1 y). 
Graph together f(x,y) and P(xy ) on a small neighborhood of (2,0); 
specifically, use 1.9 < x < 2.1, —0.1 <y < 0.1. 

c. Does P have a critical point at (2,0)? What kind? Do the graphs show that 
P and / have the same type of critical point at (2,0)? What kind of critical 
point does / have at (2,0)? 

d. Find the second-order Taylor polynomial for / at (0,0); call it Q(x,y). 
Graph together f{x,y) and Q(x,y) on a small neighborhood of (0,0); 
specifically, use —0.1 <x< 0.1, —0.1 < v < 0.1. 
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e. What kind of critical point does Q have at (0,0). Do the graphs show that 
Q and / have the same type of critical point at (0,0)? What kind of critical 
point does / have at (0,0)? 

7.7. a. Find all critical points of f(x,y) = x 3 +y 2 — 3x — 12 y. 

b. At each critical point P of /, construct the second-order Taylor polyno¬ 
mial Tp of /. Does Tp(x,y) also have a critical point at PI What kind? 

c. In a small neighborhood of each of the critical points P, sketch the graph 
of / together with the Taylor polynomial Tp. Does / resemble Tp near PI 
Is P the same type of critical point for / that it is for Tpl 

d. Conclusion: List the critical points of /, and indicate the type of each. 

7.8. a. Find all critical points of f{x,y) = x 3 — 3 xy 2 — x 2 + 3 y 2 . 

b. At each critical point P of /, construct the second-order Taylor polyno¬ 
mial Tp of /. Does Tp(x,y) also have a critical point at PI What kind? 

c. In a small neighborhood of each of the critical points P, sketch the graph 
of / together with the Taylor polynomial Tp. Does / resemble Tp near P ? 
Is P the same type of critical point for / that it is for Tpl 

d. Conclusion: List the critical points of /, and indicate the type of each. 

7.9. Locate the critical point of Q(x,y) = ax 2 + 2bxy ^-cy 2 + dx + ey + k and deter¬ 
mine its type. On which of the six parameters does the location depend, and 
on which does the type depend? 

7.10. Locate all the critical points of &(Q,v) = 1 — cos 0 + jv 2 , and determine the 
type of each. 

7.11. Let z = f(x,y ) = p 2 x 2 +q 2 y 2 , 0 < p 2 < q 2 , and let D a (x,y) be the square of 
the distance from the point (0,0,a) on the z-axis to the point (x,y,/(x, v)) on 
the graph of /. 

a. Make a sketch. 

b. Show that D a has a critical point at the origin, for every a. 

c. For two values of a, that critical point is degenerate; determine those val¬ 
ues. 

d. At all other points a, determine how the type of the critical point depends 
on a. 

7.12. Show that the formula originally used for the quadratic terms in Taylor’s ex¬ 
pansion (see Theorem 3.18, p. 94, and the discussion leading up to it) gives 
the same value as the formula using the Hessian. That is, show 

(Au-V) 2 /(a) = (Au) t // a Au, 
where V is the gradient differential operator. 
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7.13. Let M be an n x n real symmetric matrix, and let A be an eigenvalue of M. 
The purpose of this exercise is to prove that A is real. So suppose the contrary; 
let A = a + bi (with b ^ 0), and let Z = X + iY (where X and Y are real n x 1 
vectors) be a complex eigenvector for A: MZ = AZ and Z ^ 0. 

a. Let A = a — bi be the complex conjugate of A, and let Z =X—iY be the 
complex conjugate of Z. Show that A is also an eigenvalue of M (Hint: 
What is Ml) with eigenvector Z. 

b. Show that the lxl matrix Z' MZ equals A |jZ|j 2 . 

c. The conjugate transpose of the kxl matrix A is the / x k matrix A + . Show 
that the conjugate transpose of z}MZ equals A ||Z|| 2 . 

d. Compare Z'MZ and its conjugate transpose to conclude that A = A, show¬ 
ing A is real. 

7.14. Let M be an n x n real symmetric matrix. The pupose of this exercise is to 
show that eigenvectors of different eigenvalues must be orthogonal. So sup¬ 
pose X\ is an eigenvector of M with eigenvalue Ai and W is an eigenvector 
with eigenvalue Zi ^ Ai. 

a. Show thatXJMXi = Ai (Xi-X\), where X 2 -X\ is the ordinary “dot product” 
of vectors. 

b. Use (MXif = xfM(why is this true?) to show that X^MX\ = XiiXi -Xi). 
Conclude that X 2 ■ X\ =0. 

7.15. a. Find the functions pi(Ax, Ay), i = 1,2 provided by Lemma 7.3 for the func¬ 

tion F(x,y) = e*sin y when (a,b) = (0,0). 
b. Verify that e* siny = p\ {x,y)x +P 2 {x,y)y. 

7.16. The folium of Descartes f{x,y) (p. 237) evidently has a saddle point at the 
origin. This exercise provides new local coordinates (■ u , v) in which the folium 
takes the form — u 2 + v 2 . 

a. Determine <p(4,rj) = /(| — 17, | + rj); this is the form the folium takes 
under a (global) 45° rotation and dilation c(|, 77 ). 

b. Show that <p can be writen in the form a((;)<!; 2 + j3(|)i7 2 and determine 

«(4) and/3(|). 

c. Introduce a local coordinate change (m,v) = k(4,n) near (^, 77 ) = (0,0) 
that reduces tp to — u 2 + v 2 . Prove that k is a coordinate change near the 
origin. 

d. Let h = ko c '. Use a suitable graphing uility to sketch the pullback of a 
coordinate grid in the (w, v)-plane by h to show that the pullback carries 
level curves of — u 2 + v 2 to level curves of f{x,y). Compare your result 
with the figure on page 239. 

7.17. a. Sketch the zero-level of the function f(x,y) = ( xy 2 — l)(x 2 j— 1) in the first 

quadrant and infer that / has a saddle point at (x,y) = (1,1). 
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b. 



f. 


Express / in terms of window coordinates at (1,1); that is, determine Az = 
/(1 + Ax, 1 + Ay) — /(1,1) as a (sixth-degree) polynomial in Ax- and Ay. 
Show that the functions of Morse’s decomposition (Theorem 7.18) at the 
saddle point are 

h 11 = 2 + Ax +| Ay + \ Ax Ay + | Ay 2 + ^ Ax A y 1 + ^ Ay 3 + ^ Ax Ay 3 , 
h\ 2 = | + f Ax + |Av + |Ax 2 + 3Ax Ay + |A y 2 + j^Ax 2 Ay + -pjAx Ay 2 
+ ^Ax 2 A y 2 , 

hn = 2 + | Ax + Ay +1 Ax 2 + | Ax Ay + Ax 3 + -pj Ax 2 Ay + 5 Ax 3 Ay. 

Verify, by direct computation, that Az = h\\ Ax 2 + 2/z 12 AxAy + h 22 Ay 2 . 
Complete the square to obtain the coordinate change (Am, Av) = h(Ax, Ay) 
that reduces Az to the simple diagonal form Az = Aw 2 — Av 2 . Prove that h 
is a coordinate change near the origin in the (Ax, Ay) window. 

Use a suitable graphing utility to sketch the pullback of a coordinate grid 
in the (Au,Av) window to show that level curves of A u 2 — Av 2 pull back to 
level curves of Az in the (Ax, Ay) window. The figure in the margin shows 
the (Am, Av) coordinate grid in the (Ax, Ay) window, together with level 
curves of /. (Levels in the A u direction are twice as far apart as those in 
the Av direction.) 


7.18. a. The function g(x,y) = (x 2 — y 2 ) 3 — 2x(x 2 —y 2 ) + 1 has a saddle point at 
(x,y) = (1,0). (A 45° rotation-dilation converts the function / of the pre¬ 
ceding exercise into g.) Carry out all the steps of the preceding exercise to 
obtain the local coordinate change (Am, Av) = h(Ax, Ay) given by Morse’s 
decomposition that reduces g to Am 2 — Av 2 in a window centered at (1,0). 
b. Prove that the local coordinate change provided by Morse’s decomposition 
in this exercise is not a rotation-dilation of the local coordinate change of 
the preceding exercise. Suggestion: Consider the derivative of each coor¬ 
dinate change at the origin. 


7.19. Find the functions hjj(x i,X 2 ) in Morse’s decomposition (Theorem 7.18) for 
the function/(x \,X 2 ) = cosxj COSX 2 at the point ( 01 , 02 ) = (0,0). Verify that 


Ay (0,0) 


1 d 2 f 

2 dxj dxj 


( 0 , 0 ) 


for every i. j. 








Chapter 8 

Double Integrals 


Abstract Double integrals arise in a variety of scientific contexts, essentially as a 
way to calculate the product of quantities that vary. They are introduced in the first 
multivariable calculus course, together with the iterated (repeated) integrals that are 
often used to evaluate them. This chapter concentrates on definitions and properties, 
and begins with a problem in gravitational attraction that leads to double integrals. It 
then introduces a precise notion of area called Jordan content, and uses that to define 
the integral. The next chapter concentrates on evaluation, using iterated integrals, 
curvilinear coordinates, and the change of variables formula. 


8.1 Example: gravitational attraction 


Newton’s law of universal gravitation says that between any two masses there is 
an attractive force that is proportional to each of the masses and to the inverse square 
of the distance between them. 

Force is a vector quantity. To describe the force that one mass m exerts on an¬ 
other p , we begin with the vector r that gives the position of m with respect to p . 
Then the force acts in the direction of the unit vector u = r/||r||, and its magnitude 
is proportional to p and to /n/||r|| 2 . If we let G denote the proportionality constant, 
as customary, then we can write 


force = /if. 


where f = 


Gm 


IMI 3 


r. 


According to Newton’s second law of motion, the vector f is the acceleration of the 
“test mass” p ; f depends only on the mass m and on the position of p in relation to m. 
Such a vector function of position is called a vector field. This particular vector field 
is the gravitational field due to the mass m. To determine the gravitational force that 
m exerts on a test mass p located anywhere in space, just multiply the acceleration 
field vector f at that point by the size p of the test mass. The gravitational field 


The gravitational field 


o 
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8 Double Integrals 


The gravitational field 
of a large square plate 


Approximating the field 


defined by a collection of masses m,, 
individual fields: 

N 


f=5> 


i=i 


i = is just the vector sum of their 




r, 


Our formula appears to define the gravitational field of only a finite number of 
masses that are concentrated at discrete points. However, by using limit processes 
(in which sums become integrals), we can extend the formula to continuous distri¬ 
butions of matter. To see how this happens, let us analyze the gravitational field of 
a large homogeneous plate of uniform thickness. For such a plate, the mass of any 
piece is simply proportional to its area A\ 


mass = pA. 

The constant of proportionality p gives the mass per unit area, or the mass density, 
of the plate. We determine the gravitational field of the plate only for points directly 
above the center of the plate. (Eventually, we assume that the plate is so large that it 
is effectively infinite in extent. In that case, every location on the plate is like every 
other; we are able to think of any point on the plate as its “center.”) 



For now, let the plate be the square —R < x.y < R in the (x.j-')-plane. We want 
to determine the gravitational field at the point (0,0,a), a > 0, on the z-axis. We 
can use the additivity of the field: divide the plate into a number of small cells, 
approximate the field due to each cell, and then add the results to get an estimate of 
the field due to the whole plate. 

We choose cells that form a grid of small congruent squares of dimensions Ax = 
Ay = R/k and mass p Ax Ay. R/k is small enough, we can approximate the field 
due to a single square by imagining all its mass is concentrated at its center. If the 
center is at (x/,jy, 0), then 

r = (xi,yj, 0) - (0,0,a) = (xi,yj,-a), 

so the gravitational field due to this one square is approximately 

Gp Ax Ay f _ 

tJ (x2+y2 + a 2)3/2^>^b *)■ 
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As the figure above indicates, the horizontal component of fy is canceled by the hor¬ 
izontal component of the field due to the square centered at symmetrically opposite 
point (—x/, —yj) on the plate. Thus, for the whole field f, only its vertical compo¬ 
nent is nonzero. In fact, the four points {xi-yj), (x,-, —yj), (— x,,y/), and (— x^—yj ) 
contribute the same vertical component, so we can restrict ourselves to squares in 
the first quadrant, writing the four contributions together as the scalar 

GpAxAy —4GpaAxAy 

(x? +y 2 j + a 2 ) 3 / 2 U (x? +yj + a 2 ) 3 / 2 

Therefore, the vertical component of the field at z = a that is due to the whole plate 
is approximately the double sum 


field: 


k k 

II 

1=17=1 


—4GpaAxAj> 
(x 2 +y 2 + a 2 ) 3 / 2 


The coordinates (x,-,y/) of the centers of the squares in the first quadrant start with 
(x\,y\) = (Ax/2, Ay/2) and then increase by steps of Ax and Ay: 


x\= Ax/2, y\=Ay/2 , 

x/=x,_i+Ax, i = 2,...,k, yj=yj-i+Ay, j = 2,...,k. 

Computing the 
double sum 


PROGRAM: The gravitational field of a large plate 

a = .2 
R = 32 
k = 64 
dx = R / k 
dy = dx 
sum = 0 
x = dx / 2 
FOR i = 1 TO k 
y = dy / 2 
FOR j = 1 TO k 

sum = sum - a * dx * dy / (x~2+y''2+a"2) " (3/2) 

y = y + dy 

NEXT j 
x = x + dx 
NEXT i 

PRINT k, sum 


Using a simple program such as the the one below, we can compute the double 
sum for any given value of a. To simplify the computation, however, we first choose 
units that make 4Gp = 1. Then, keeping in mind that the dimensions of the plate 
should be large in comparison to the distance to a point in the field (i.e., R a), we 
choose R = 32 and then do a sequence of calculations for a = 0.2, 0.1, and 0.05. 


The results appear in the table below. Each column indicates how the estimate 
of field strength changes as the number of squares increases when a is fixed. (Note 
that there are k 1 squares, and thus more than a million when k = 1024.) It appears 


How the sum 
varies with k 
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The Riemann integral 
as a limit of sums 


“Integral as product” 


that the sums are converging to some fixed value as k increases. For example, when 
a = 0.2, the first seven or eight digits of the sum have “stabilized” when k = 1024, 
suggesting that a limit is emerging: 


field at 0.2 = lim sum = —1.561957_ 

*-**> 

But for the same value of k, only the first four digits appear to have stabilized when 
a = 0.1, and only the first two when a = 0.05. Flowever, when k is doubled, even 
the sum for 0.05 appears to have stabilized out to four digits. This suggests 

field at 0.1 = -1.566..., field at 0.05 =-1.568.... 



a = 0.2 

a - 

= 0.1 

a 

= 0.05 

k 

sum 

k 

sum 

k 

sum 

64 

-1.233 

64 

-0.757 

128 

-0.759 

128 

-1.526 

128 

-1.238 

256 

-1.240 

256 

-1.561691 

256 

-1.530399 

512 

-1.533 

512 

-1.561957628 

512 

-1.566110 

1024 

-1.568320 

1024 

-1.561957637 

1024 

-1.566377 

2048 

-1.568587 

We recognize that the double sums are, 

in fact, Riemann 

sums for the function 


, , -4 Gpa 

( x 2 +y2 + a 2y/2’ 

thus, the limiting value that we are seeking for each a is the Riemann integral of that 
function for the given a\ 


k k /. /» 

field at a = lim V T (p a (xi,yj)AxAy = / / 


(pa(x,y)dxdy. 


0<x <R, 
0 <y<R 


In this expression, “ dxdy ” is sometimes called the element of area; it represents the 
limit of the area Ax Ay of a rectangle in the Cartesian grid. When other coordinates 
are used, the element of area may have a different form. However, our expression 
for a double integral always contains an element of area. 

The integral arises here in a typical way: it is a number that is essentially the 
product of two quantities. (For example, field strength is the product of a mass and 
the reciprocal of a distance squared.) But the quantities involved are variable, so 
their product cannot be found directly as a single number. The remedy is to restrict 
the quantities being multiplied to small cells on which they become nearly constant. 
(For example, restrict to small squares on the plate). Now calculate a representative 
product on each cell, and then add the results over all cells. The sum gives an ap¬ 
proximation to the numerical value we seek. To get a better approximation, make 
the cells even smaller. If the sums tend to a limit as the cells get smaller, that limit 
is defined to be the integral. In Chapter 8.3, we use these ideas to define the integral 
and catalogue some of its properties. 
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Let us return to the estimates of the gravitational field provided by our calcula¬ 
tions. Notice that field strength does not vary with the inverse square of the distance 
a to the plate. On the contrary, the calculations suggest that field strength is es¬ 
sentially constant for all a <C R. If R = °°, then a <C R for all finite a, so it seems 
reasonable to speculate that the gravitational field of an infinite plate is exactly con¬ 
stant. But we cannot check this directly: if R = the Riemann sums we use to 
estimate the field are not defined. (The BASIC program breaks down on its second 
line.) Indeed, we define the Riemann integral only for a bounded domain. 

But there is a standard way out of the difficulty: determine the value of the in¬ 
tegral as a function of R, and then see if the values tend to a well-defined limit as 
R —» oo. When the limit exists, it is called the improper integral. Improper double 
integrals arise from unbounded functions as well as from unbounded domains; we 
define both kinds in Chapter 9. However, we can even now confirm that an infinite 
homogeneous plate produces a constant gravitational field; we just need to start with 
a different shape. 

Because we are interested ultimately in the field of the infinite plate, let us allow 
ourselves to change the shape of its finite approximation. Specifically, we change the 
plate from a square to a circle, and then exploit the circular symmetry by changing 
from Cartesian to polar coordinates. This change leads to a one-variable improper 
integral that determines the field of an infinite plate. 




Let the plate have radius R and be centered at the origin; then it is given by the 
inequalities 0 < r < R, 0<9< 2 n. To divide the plate into small cells, it is natural 
to use equally spaced concentric circles r = constant and radial lines 9 = constant, 
with spacings 

Ar = —, A9 = ——, k,l positive integers. 
k l 

The cells are not uniform in size; their areas grow with r. Choose the representative 
point (r,-, 9j) at the center of the cell. Each cell is the portion of a circular ring of 
thickness Ar that is cut out by a central angle A 9. The area of the whole ring between 
r,- — Ar/2 and r, + Ar/2 is 


n 




= 2nrjAr. 


How the field 
varies with position 


Improper integrals 


The gravitational field 
of a circular plate 


Area of a cell 
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Simplifications 
due to symmetry 


A single cell occupies the fraction A0/2 n of this ring, so its area is 
A q \ 

— ) 2nrjAr = r; Ar A0 = AA(ri,Ar,A9) = AA, 

2k J 

and we can write its mass as p AA. Assuming the mass is concentrated at (r,-, 9/), we 
estimate the cell’s contribution to the gravitational field at z = a is approximately 

— GpaAA 

(rf + a 2 ) 3 / 2 

Therefore, the field at z = a that is due to the whole plate is now approximated by 
the double sum 

V V —GpaAA 
field at a « > > — 3 -r——. 

; tt “i (rj + a 2 y / 2 

Notice that the values 9j are absent: all cells in a given ring (i.e., with fixed r,) 
make the same contribution. This symmetry with respect to 0 means we can write 
the inner sum (where i, and hence r,-, is fixed) as 

V -°P aAA _ -Gpa y M _ —Gpa 
y ti (r 2 +a 2 )V 2 (rf + a 2 ) 3 / 2 £ (rf+a 2 ) 2 ! 2 

because X is just the area of an entire circular ring. Thus the field due to the 
whole plate reduces to a sum over a single index: 


field at a ss —2 nGpa^ ■ 


r,- Ar 


Ml ('f+« 2 ) 3/2 ’ 

But this is just a Riemann sum for the one-variable function 
Mr) =-ZKOpa ^2)3,2 - 

Therefore, as k —> and Ar —> 0, the sum becomes the ordinary (i.e., single) integral 


r R r 

/ h(r)dr = —2nGpa 
Jo Jo 

=2nGpa 


••dr 


(r 2 +a 2 ) 3 / 2 
R 


(r 2 -fa 2 ) 1 / 2 


=2 nGpa 


(R 2 +a 2 )V 2 a/' 


Because the sum becomes a better and better approximation to the field as Ar —> 0 
(when k °°), we write 


field at a = — 


f R 

2nGpa 

Jo 


rdr 


(r 2 +a 2 ) 3 / 2 


= 2 nGp 


\] R 2 + a 2 


: — 1 
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Let us compare this result with what we computed for the square. Thus we set 
4 Gp = 1 and R = 32, and then find 

field at 0.2 =-1.56098, .. .at 0.1 =-1.56589, ... at 0.05 =-1.56834. 


These values are virtually identical with the corresponding ones for the square. Now 
that we have an exact formula for the field, it is easy to see what happens as the plate 
becomes infinite in extent, that is, as we let R —> Because 


lim 


a 

\/ R 2 + a 2 


= 0 , 


we find that 

field = 2 nGp ■ (—1) = —2 nGp = constant; 

the field is indeed independent of the distance a from the plate. Furthermore, 
after the normalization 4 Gp = 1, the field strength takes on the constant value 
— n/2 = —1.570793 — In fact, we express the field of the infinite plate as an im¬ 
proper integral, that is, as the limit of a sequence of “proper” integrals: 

f°° r ^ r r™ f R rc ^ r y m ( 1 I ^ 1 

Jo (r 2 +a 2 )V 2 = R™Jo (r 2 + a 2 ) 2 / 2 = ^ {R 2 + 0 2 ) 1 / 2 _ a) = ~a' 

We evaluate improper double integrals in a similar way (cf. Chapter 9.2). 

Let us return to the finite circular plane and the double sum formula 


k I 


11 

i=\j=\ 


— GpciAA 
(,.2 +fl 2)3/2 


that expresses the approximate field strength in polar coordinates. As we did with 
Cartesian coordinates, we can recognize that these are Riemann sums for the func¬ 
tion 

/ m -Gpa 

(,1 + alfir 

The exact value of the field strength is thus the Riemann integral that represents the 
limit of these sums: 


k i 

field at a = lim II \l/ a (n,ej)AA 

to=u=i 


= JJ V u (r,e)dA. 

0 <r<R, 

0<d<2n 


In this expression, dA is the element of area (cf. p. 272); it represents the limit of 
the area AA of a cell in the polar coordinate grid. 

For a Cartesian grid, dA = dxdy is just the product of the “elements of length” 
dx and dy for the individual coordinates. However, dA ^ drdQ for a polar grid. On 
the contrary, we have already noted that AA = ( Ar)(rA6 ). Although A0 is dimen¬ 
sionless, r A0 does have the dimensions of a length: r A0 is the length of the arc 


Infinite plates and 
improper integrals 


Double integral 
in polar coordinates 


dA =rdrdQ 
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Riemann integration: 
a sketch 


Partitioning 
2-dimensional regions 


Jordan content 


Plane topology 


subtended by the angle A 6 on the circle of radius r. Informally, then, rdO and dr 
are the “elements of length” whose product is the element of area dA = rdrdO. 
Thus we write 


f f f f r dr d 0 

field at a = JJ Va (r,e)rdrdd =-Gpa JJ ^ 

0 <r<R, 0 <r<R, 

0<6<2n 0<8<2n 


8.2 Area and Jordan content 

The definition of the Riemann integral of a function over a region is simple in out¬ 
line. First, partition the region into many small pieces; then multiply the “size” of 
each piece by a value that the function takes on somewhere in that piece, and sum 
those products; finally, repeat the process with ever smaller pieces and take the limit 
of the computed sums. To convert this sketch into something useful and precise, one 
of the questions we must decide is what kind of pieces can be used to make up a 
partition. 

When the function depends on just a single real variable x, the answer is immedi¬ 
ate: each small piece is an interval a<x<b whose “size” is its length, b a. But if 
the function depends on two real variables, x and y, the answer is not so clear. Cer¬ 
tainly there is a 2-dimensional analogue for an interval; it is a rectangle a<x<b, 
c<y<d in the (x,y)- plane with sides parallel to the axes, whose “size” is given 
by its area A = (b — a)(d — c). But now consider what happens under a change of 
variables. On the line, a small interval is generally transformed into another small 
interval. On the plane, however, a small rectangle is transformed into a quadrilateral 
with curved sides (see Chapter 4), so these more general shapes appear in partitions 
as naturally as rectangles do. We therefore get a better answer to the question by 
focusing not on the shape of a small piece but on whether we can assign it a “size.” 

The size of a region will be its area, of course; we have to worry about admissible 
shapes because not every region in the plane has a well-defined area. For example, 
there is no consistent way to assign a nonzero area to the set of points in the unit 
square that have rational coordinates (the rational points); see below, page 279. 
If this is not immediately evident, however, it may be because our notion of area 
is more intuitive than precise. Thus, we construct a precise notion of size (called 
Jordan content ) that captures our intuitive ideas about area, extends immediately to 
higher dimensions, and fits well with the process of integration. 

Before we discuss Jordan content, though, we must establish some basic topology 
concerning the interior and the boundary of a set in the plane. 

Definition 8.1 The open (respectively, closed) disk of radius r > 0 centered at the 
point p in R 2 is the set of all points x in M 2 for which || x — p|| < r (respectively, 
l|x-p|| <r). 
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Definition 8.2 A point p is an interior point of a set S in R 2 if some open disk 
centered at p is contained entirely in S. 

Definition 8.3 A point q is an exterior point of S if it is an interior point of S', the 
complement of S in R 2 . A point b is a boundary point ofS if it is neither an interior 
nor an exterior point of S. 


Every interior point of S lies inside S, of course; what makes it an interior point, 
however, is the fact that it is surrounded by points that also lie inside S. Likewise, 
every exterior point lies outside S and is surrounded by points outside S. An individ¬ 
ual boundary point may lie either inside or outside S, but every open disk centered 
at a boundary point contains at least one point in 5 and one point outside 5 (see 
Exercise 8.5). For example, the open and closed disks with a given radius and cen¬ 
ter have the same boundary points, namely the points on the circle with that radius 
and center. They represent two extremes: the closed disk contains all its boundary 
points, but the open disk contains none. These become the models for closed and 
open sets in general. 

Definition 8.4 A set is closed if it contains all its boundary points; it is open if it 
contains none of them. 


Open and closed sets 


closed disk 



open disk 


Thus every point of an open set is an interior point. It is more common to define 
a closed set, however, as one whose complement is open. The following theorem 
shows that our definition is equivalent to the usual one. 

Theorem 8.1. The set S is closed <S=> The complement S' is open. 

Proof. S is closed <t=> S c contains no boundary points of S 
S c contains only exterior points of S 
<(=> all points of S' are interior points of S" 

S c is open. □ 

Definition 8.5 The interior of S, denoted °S, is the set of interior points of S; the 
boundary of S, denoted dS. is the set of boundary points of S; the closure of S. 
denotedS, is SUdS. 


Thus, S is open if .S' = °S and is closed if .S' = .S'. We “open” S (i.e., create its interior) 
by removing from S all its boundary points; we “close” S (create its closure) by 
adding to S all its boundary points. The symbol we have introduced for boundary 
may have no good rationale at the moment, but its aptness should become clear 
when we reconsider Green’s theorem in Chapter 10; see especially page 427. 

When S is a familiar shape—such as a polygon—its interior, exterior, and bound- The boundary can be 
ary are what we expect. But when S is less familar, intuition may be a poor guide. nonintuitive 

For example, even though a polygon’s boundary separates the interior from the ex¬ 
terior, this is not true for all sets. For one thing, the boundary may not be a finite 
collection of curves or line segments. Take S to be the plane minus the origin; then 
dS is just the origin and there are no exterior points at all. According to Exercise 8.5, 
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S: 

Ill 

dS 

exterior of S 



Areas from squares 


every disk centered at a boundary point must contain a point outside S; in this case, 
the only such point is the center of the disk itself. 

For an even more instructive example, take S to be the open unit disk centered 
at the origin, minus the portion L of the x-axis that lies within 1 unit of the origin. 
The exterior points q of S are those for which ||q|| > 1. The boundary of .S' is a pair 
of curves: the unit circle and the line segment L. Any neighborhood of a point on 
the unit circle does indeed contain both interior and exterior points of S, but that 
is not true of L. At any point b of L, a sufficiently small open disk centered at b 
will contain no exterior points of S whatsoever. Therefore, we cannot say that L 
“separates” the interior of S from its exterior. 

For a set that cannot be sketched easily, its interior and boundary may be even 
more nonintuitive. For example, take S to be the set of all points in the closed disk of 
radius 1 together with all rational points in the disk of radius 2, both centered at the 
origin. The interior of S is the open disk of radius 1, and the exterior of S consists 
of all points q with ||q|| > 2. The boundary of S is everything else; it is the annulus 
of points b with 1 < ||b|| <2. Flere is a set whose boundary is “thick;” although 
dS does separate the interior and the exterior of S, it is not a simple 1-dimensional 
curve. Another example in the same vein is the set of all rational points in the plane. 
The interior and the exterior are both empty, so the boundary is the entire plane. As 
we show, sets such as these that have “thick” boundaries will always fail to have 
Jordan content. 


To define Jordan content, we use a method that we introduce now in an intuitive 
and ad hoc way to find the areas of two particular sets. In one case, the method 
succeeds; in the other, it does not, illustrating how a set can fail to have Jordan 
content. 

The fundamental shape whose area we know is a square. Consider how we might 
use only squares to approximate the area of the closed unit disk S. Cover the plane 
with a grid of squares of width w. If L of them lie entirely inside S, then the area 
of S must be at least Lw 2 . Iff/ of them meet S, then the area of 5 can be no more 
than Uw 2 . For example, if w = 1/5 and the origin is one of the grid intersection 
points (below, left), then we can use 3-4-5 right triangles to show that L = 60 gray 
squares (15 in each quadrant) lie inside S, and U = 104 squares altogether meet S, 
implying 

60 , , 104 

2.4 = — =^14^ < areaS< Uw 2 = —— = 4.16. 
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These are lower and upper bounds for the area, but neither is a good estimate for the 
correct value, 7T « 3.14. The smaller fails to count area that it should and the larger 
counts area that it should not. The difference between the bounds—which indicates 
the coarseness of the estimates—equals the total area 44/25 = 1.76 of the white 
squares in the figure on the left. 

To get better estimates, use smaller squares; then more of the area inside S will be 
counted, and less outside. Take w= 1/10, giving us four small squares in each large 
square (above, right). We find nine additional small squares (darker gray) inside S 
in each quadrant, so L = 4 x 60 + 36 = 276. Furthermore, 14 new small squares 
(hatched) in each quadrant now lie completely outside S, so U = 4 x 104 — 56 = 360, 
and the new bounds are 


2.76 = Lw 2 < area5 <Uw 2 = 3.6. 

The difference between the new bounds (which is the area of the 84 small white 
squares in the figure on the right) is less than half what it was in the previous stage; 
in this sense, the new estimates are twice as good as the old. (Their average, 3.18, is 
within 1 \% of the true area.) 

Calculations made with further refinements of the grid (see Exercise 8.17) sug¬ 
gest that the difference between the bounds—as measured by the area of the white 
squares—shrinks to zero, forcing our estimates of the area of S toward a single 
value. Notice that those white squares also cover the boundary circle, implying that 
the area of the boundary is zero. The circle is, of course, a pair of graphs; as we show 
(Theorem 8.10, p. 284), the graph of a continuous function on a bounded interval 
will always have zero area. In summary, S has an area, and its boundary is “thin” 
enough to have zero area. 

For our second example, take S to be the set of rational points in the unit square 
that lies in the first quadrant with a corner at the origin. Let us use the same method 
of counting squares to find the area of S, assuming it has an area. As before, we 
cover the plane with a grid of squares of width w. But now, because S has no interior 
points, no grid square lies entirely inside S. In other words, there are no solid gray 
squares; L is always zero. On the other hand, many grid squares have a point in 
common with S, so U > 0. For example, if w = 1 /5, then we can count 25 such grid 
squares inside the unit square itself, plus 5 more meeting the unit square along each 
of its four sides, plus 1 more at each each corner; thus U = 25 + 4x 5+4= 49. If S 
does have an area, then 

0 = Lw 2 < area5 < Uw 2 = ^ = 1.96. 

When we set w = 1/10 to refine the grid, then U = 100 + 4x 10 + 4= 144 and 
the bounds become 

, , 144 

0 =Lw 2 < areaS < Uw 2 = —— = 1.44. 


Refine the grid to get 
a better estimate 


Area of the boundary 


A set with no area 
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positive area 


Defining Jordan content 


More generally, if w = 1 /2”, then U = 2 2n + 4 x 2” + 4 and the bounds are 

, ,11 

0 = Lw* 1 < areaS < Uw 2 = 1 + -—~ 

2«—2 2 2n ~ 2 

No matter how small we make w, the bounds never get any better than 

0 < areaS < 1. 

Our method of counting squares thus fails to assign a meaningful value to the area 
of S. 

How is this failure connected to the size, or “thickness,” of dS ? For the closed 
disk, the difference Uw 2 — Lw 2 was the area of the white squares that covered the 
boundary circle; it served as an upper bound on the area of that circle. For the ratio¬ 
nal points in the square, the difference is 

Uw 2 — Lw 2 = I/w 2 > 1 

for all w > 0; in particular, the difference does not converge to zero. Indeed, the area 
of dS is 1. To see why, recall from our earlier examples (p. 278) that dS = S is the 
closed unit square, so area <95 = areaS = 1. Thus, in contrast with the disk, the set 
of rational points in the square does not have an area, but its boundary is “thick” and 
does have positive area. 

We can now formalize the method we use to define the Jordan content of a set. 
There are three steps. First, we count grid squares to get monotonic sequences of 
“inner” and “outer” areas. Second, we compute the limiting areas as the grids are 
refined. Third, we see whether the two limits are equal; if they are, the set has Jordan 
content equal to the common value. Although it might seem reasonable to choose the 
grids based on how well they are adapted to a given set, it is not a priori evident that 
the value obtained from one sequence of grids would then equal the value obtained 
from another. Thus, to eliminate ambiguity, we always use just one collection of 

grids, Jo, J7 2 >_Later (pp. 287ffi), we in fact introduce other grids and prove 

that they yield the same value given by the grids J k . 





The grid J 0 consists of the closed unit squares in the (x,y)-plane that are bounded 
by the vertical lines x = integer and y = integer. To get the squares of the next 
grid J !, divide each unit square into four congruent subsquares. Because every 
square of J\ lies entirely in a single square of we say J\ is a refinement of J? 0 - 


The grids J k 
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Use the same procedure to get J7 2 as a refinement of J/ 1; and so on. The squares in 
the grid J k at stage k have width w = 1 /2 . 

Definition 8.6 LetJ_ k {S) denote the total area of the squares in J k that are entirely 
contained in the bounded set S, and letJ k (S) denote the total area of the squares in 
J k that intersect S. 

The “inner” and “outer” area estimates for S (J k (S) and J k (S), respectively) at the 
various stages are nested together in the following chain: 

0 < J 0 (S) < ■< J k (S) < J k+l (S) <■■■<!,+, (S) < J,(S) <■■■< J 0 (S) < co. 

To see this, note first that 0 < J_ 0 (S) because our squares all have positive area. 
Also, Jo(S) is finite because the bounded set S can meet only a finite number of unit 
squares. The remaining inequalities in the chain follow from the next two lemmas. 

Lemma 8.1. For any bounded set S and integer k > 0, US) < J k+ 1 (S) and 
Jk+i(S) <J k (S). 

Proof. If a square is counted in J k (S), then its four subsquares, with the same total 
area, are counted inJ^^S 1 ); hence J k (S) <J k+1 (S). 

Similarly, if a square is counted in J k+ \ ( S ), then the square in that contains it 
is counted in J k ( S ); that implies J k+ 1 (5) < J k ( S). □ 

Lemma 8.2. For any bounded set S and integers kf > 0, J k (S) < 

Proof. First note that J_ ; (5) < Jj(S) for every j > 0, because any square counted in 
Jj(S) is also counted in Jj(S). Then (using Lemma 8.1), 

k>l =► J k (S)<J k (S)<J,(S); 

k<l => J k (S)<J,(S)<J,(S). □ 

The nested inequalities imply that the sequence J_ k (S) of “inner areas” is monotonic 
increasing and bounded above; hence it has a finite limit. The sequence J k (S) is 
monotonic decreasing and bounded below, so it has a finite limit, too. 

Definition 8.7 The inner and outer Jordan content of the bounded set S are the 

respective limits 


J(S) = lim J k (S) and J{S) = hm J k (S). 

Lemma 8.2 implies that J(S) < J(S), but J(S) and J(S) need not be equal. For 
example, when S is the set of rational points in the unit square, our earlier work 
shows that J(S) = 0 and J(S) = 1. 

Definition 8.8 IfJ(S) = J(S), then we say S is Jordan measurable, or is a J-set, 
and its Jordan content isJ(S) = J(S) = J(S). 


J k (S) and J k (S) 


Inner and outer 
Jordan content 
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5 


Qi s 
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The definition of Jordan content is now clear; however, it is not yet evident that the 
Jordan content of a set equals its usual area, not even for one of the original grid 
squares! Most of the rest of this section is devoted to establishing the properties 
of Jordan content. From that emerges its connection with area. (Exercise 8.7, for 
example, establishes that the Jordan content of a square in J k is indeed its area, 
l/2 2k .) The first property is the fundamental one we saw in the two illustrative 
examples: a set has Jordan content precisely when its boundary is “thin” enough to 
have Jordan content equal to zero. 

Theorem 8.2. The set S is Jordan measurable J(dS) = 0. 

Proof. We make use of the fact that 

5 is Jordan measurable lim ( Jk(S) — J k (S)) = 0- 

The number J k (S) — J k (S) is the total area of the squares that meet S but are not 
entirely contained in S. Each such square thus contains a point p in S and a point q 
not in S. Also, it is a convex set that contains the entire line segment connecting p 
and q. Therefore, by Exercise 8.6, this square contains a point of dS, so it is counted 
in J k (dS)', hence 

J k (S)-J k (S)<J k (dS). 

Conversely, suppose a square Q\ in J k contains a point b in dS. We claim b must 
also lie in one of the squares Q 2 that is counted in J k (.S') — J k (S). If b is an interior 
point of the square Q\ , then every sufficiently small open disk centered at b lies 
in Q\. But by Exercise 8.5, every such disk contains at least one point in 5 and at 
least one point not in S. Thus we can take Q 2 = Q\- 

If, on the contrary, b lies on either a side or a corner of Q\ , then it also meets 
either one or three squares adjacent to Q\ in J k . Because b is in dS, at least one of 
these (two or four) squares contains a point in S and at least one contains a point 
not in S. For suppose each square contained exclusively one kind of point or the 
other. Because b is in each of these squares, it must be both in S and not in S. This 
is impossible, so at least one square Q 2 contains both kinds of points; (fi is counted 
inJ k (S)-J k (S). 

Thus, each square Q\ counted in JfdS) is either equal to a square Q 2 that is 
counted in J k (S) — J k (S ), or it is one of the eight neighbors of Q 2 in the grid J k . 
The total area of squares Q\ is therefore not larger than nine times the total area of 
squares Q 2 : 

J k (dS)<9(j k (S)-J k (S)). 

The two displayed inequalities now allow us to write 

S is Jordan measurable <t=> lim (J k (S) — J k (S)) =0 
<£> lim J k (dS) =0 

<t=> J(dS) = 0. □ 
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The next several results concern primarily outer Jordan content and sets whose 
Jordan content is zero. They are useful on their own and they culminate in the the¬ 
orem (Theorem 8.10) that the graph of continuous function on a bounded interval 
has Jordan content zero. 

Theorem 8.3. Let S and T be bounded sets with 5 C T; then 7(5) < J(T). 

Proof. Every square in J k that meets 5 also meets T, so J k (S) < J k (T). The in¬ 
equality is preserved in the limit as k —» »>, so J(S) <J(T). □ 

Corollary 8.4 IfT has Jordan content zero, then so does every subset ofT. 

Proof. If 5 C T and J{T) = 0, then 7(5) = 0. □ 

Theorem 8.5. IfS and T are bounded sets, then 7(5U T) < 7(5) +J(T). 

Proof. Every square in J k that meets 5U T meets either 5 or T (or both); thus 

Jk(SUT) <J k (S)+J k (T). 

The inequality is preserved in the limit as k —» «>. □ 

Corollary 8.6 lfS\,S p are bounded sets, then 

J(S\ U • • -US p ) < 7(5i) H-b7(5^). D 

Corollary 8.7 The union of a finite number of sets that have Jordan content zero 
also has Jordan content zero. □ 

In particular, every finite set of points has Jordan content zero. 

Corollary 8.8 Suppose that, for any e > 0, the set 5 is contained in a finite number 
of sets whose total Jordan content is less than £. Then 5 has Jordan content zero. 

Proof. Suppose 5 C 7) U • • • U T p and 

7(7)) + • • • +J(T p ) = 7(7)) + • • • +7(7},) < e. 

Then 7(5) < £ for every £ > 0, so 7(5) = 0. □ 

Theorem 8.9. The Jordan content of a square in J k is its ordinary area, 1 /2 2k , and 
the Jordan content of the rectangle [a,b\ x [c, d] is its ordinary area (b — a){d — c). 

Proof. See the exercises. □ 

We now introduce the notions of the floor and ceiling of a real number as conve¬ 
nient tools for our work. They are used immediately in the next proof. 

Definition 8.9 The floor of the real number x, denoted [jcJ, is the largest integer m 
for which m < x. The ceiling of x, denoted |x], is the smallest integer M for which 
x<M. 


Outer content and 
Jordan content zero 
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The Jordan content 
of a graph 


Theorem 8.10. The graph of a continuous function on a bounded interval has Jor¬ 
dan content zero. 

Proof. Let y = f(x) be continuous on the interval A < x < B. We show that the 
graph of / is contained in a finite number of rectangles whose total area is less than 
any preassigned £ > 0. Corollary 8.8 then implies that the graph has Jordan content 
zero. 

Because the interval is closed and bounded, / is uniformly continuous (see a text 
on analysis), which means that for any £ > 0, there is a 5 > 0 for which 

\u — v\<8 => \f(u)-f(v)\ < A ^g_ A y 

for any A < u,v < B. A bound like e/4(B — A) is chosen with hindsight, of course; 
the reason emerges below. Let 



so that aS < A < (a + 1)5 and (/3 — 1)5 < B < [id. Now partition the x-axis into 
nonoverlapping closed intervals of length 5, beginning at a 5 and ending at /3 5. 
Then A lies in the first interval and B in the last. For clarity, we want these two to 
be different intervals, so we require (a + 1)5 < (/3 — 1)5; it is sufficient to take 
25 < B — A (Exercise 8.9). 

Suppose u and v lie in the same interval. If x is the midpoint of that interval, then 
\u — x\ < 5/2 < 5, |v — x| < 5/2 < 5, so 


I/O) - /(v) I < I/O) - fix) | + I/O) - /(v) | 

£ £ £ 

< 4(B-A) + 4{B-A) ~ 2 (B-A)' 


The inequalities imply that the graph of y = fix) is entirely contained inside p — a 
closed rectangles, each of which is 5 units wide and e/2 iB — A) units tall. (As the 
figure below demonstrates, it may be necessary to take x to be A or B in the first or 
last interval, so there may be some overlapping at the ends.) The total area of the 
rectangles is 


Q3 — a) x 5 x 


£ 

2 (B-A)' 



A _ _ B(=x) 

a8 (a+l)5 (a+2)5 ••• ••• (15-2)3 (>3-1)5 PS 


x 
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But ([} — a)8 <B — A + 28; because we have also taken 25 < B— A, it follows that 
(/3 — a)5 < 2 (B—A) and thus 

(p -a)8 

total area =--e < e. 

2 (B-A) 

The graph of / is therefore contained in a finite number of sets whose total Jordan 
content can be made smaller than any preassigned positive number e. By Corol¬ 
lary 8.8, the graph of / has Jordan content zero. □ 

Corollary 8.11 Suppose S is a bounded set in the (x,y)-plane whose boundary con¬ 
sists of a finite number of curves, each of which is the graph of a continuous function 
y = f{x) or x = (p(y). Then S is Jordan measurable. 

Proof. Each graph has Jordan content zero. There are only finitely many in dS, so 
dS likewise has Jordan content zero. By Theorem 8.2 (p. 282), 5 itself is Jordan 
measurable. □ 

Theorem 8.12. I/ S and T are Jordan measurable sets, then so are SUT and ,STl T , 
and 



7(5U7) <7(5)+7(7), 
j{su t) < j{s ), J{sn t) < J(T). 

Proof. Each boundary point of either S U T or ST) T is a boundary point of .S' or of T: 

d(SUT) C dSUdT, 5(50 T) C dSUdT. 

By hypothesis, dS and dT have Jordan content zero; hence, so do their subsets 
d(SUT) and 5(50 T) (Corollary 8.4). Consequently, SUT and SOT are both 
Jordan measurable. Theorem 8.5 then implies 

7(5U T) = 7(5U T) < 7(5) +J(T)= 7(5) +J(T). 

Because SC\T C 5, Theorem 8.3 implies 7(50 7) = 7(50 7) < J(S) =J(S). Finally, 
SUT CT, so the same argument gives 7(5 D T) < J(T). □ 

Definition 8.10 Two sets overlap if their interiors have a nonempty intersection. 
They are nonoverlapping if their interiors are disjoint. 

Theorem 8.13. //.S and T are bounded Jordan measurable sets that do not overlap, 
then 

J{SUT)=J{S)+J{T). 

Proof. Because 5 and T have disjoint interiors, a grid square Q\ that counts in the 
area 7 A .(5) cannot be entirely contained in T , so it does not count m.J_ k {T). Simi¬ 
larly, a grid square that counts in J k (T) does not count in 7 k (S). Of course, every 
grid square that counts in one or the other counts in 7 A (5U T), so 
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J k (S)+J k (T)<J k (SUT). 
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Jordan content 


Overlapping sets 


(SuT)\S 



T\(SnT) 


Area under a graph 


In the limit, J(S) +J(T) < J(S U T), and then J(S) + J(T) < J(SU T) because S, T, 
and SU T are Jordan measurable. Finally, with Theorem 8.12 we have 

J(S)+J(T) = J(SUT). □ 

This leads immediately to the finite additivity of Jordan content. 

Corollary 8.14 If Si, ..., S p are Jordan measurable sets, and no two overlap, then 
Si U • • • U Sp is Jordan measurable and 


J{SiU — \JS p )=J{Si) + — +J{S p ). □ 

When sets overlap, there is still a definite relation between the area of their union 
and their individual areas. We make use of the set difference T\S; this is the set 
of points in T that are not in 5 (i.e., T \ S = T n S c ). The definition does not assume 
SCT. 

Lemma 8.3. Suppose T and SCT are Jordan measurable; then T\S is Jordan 
measurable and J(T\S) = J(T) — J(S). 

Proof. Becuase d(T\S) C dTUdS (see Exercise 8.14.b), d(T\S) has area zero 
and T\S is Jordan measurable. Because T = ( T\S)US and T\S and S do not 
overlap (they are disjoint), we have 

J{T)=J(T\S)+J(S). □ 

Theorem 8.15. Suppose S and T are Jordan measurable; then 

J{S U T) +J{SC T) = J{S) +J(T). 

Proof. Because S C (SU T) and (SCT) C T, and all these are Jordan measurable, 
the lemma applies to both set differences 

(5 , ur)\s=r\(5 , nr). 

Because the set differences are equal, the lemma implies 

J(S U T) -J(S) = J(T) -J(SD T); 

to get the theorem, just rearrange the terms. □ 

Another way to state the theorem that perhaps indicates more explicitly how the 
overlap SCT affects the area of the union is 

J(S U T) =J(S) +J(T) -J(SC T). 

We can equate the integral of a function with the Jordan content of the region 
under its graph, making a useful connection between area and Jordan content. 
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Theorem 8.16. Let y = f(x) be continuous and nonnegative on a <x < b, and let S 
be the region in the [x,y)-plane bounded by the vertical lines x = a and x = b, the 
x-axis, and the graph of f. Then S is Jordan measurable and 

f b 

J{S) = / f(x)dx. 

J a 

Proof. The boundary of S has Jordan content zero (Theorem 8.10), so S is Jordan 
measurable. 

To get estimates for the integral off, subdivide the interval a<x<b into K equal 
pieces I\, ...,1k, each of length Ax = (b — a) /K. Let mj and Mj the the minimum 
and maximum values of y = f(x) on If, then 

f b 

m [Axd- \-mKAx< / f(x)dx<M iAxJ- \-MkAx , 

J a 

and these bounds converge to the value of the integral as K —> 

Now let rj be the rectangle with base Ij and height mj, and let Rj be the rectangle 
with the same base but height Mj. Then r\, ..., rK are nonoverlapping rectangles 
whose union is contained in S, and R\, ...,Rk are nonoverlapping rectangles whose 
union contains S: 

r\ U • • • C 5 C R\ U • • • U Rk- 
By the finite additivity of Jordan content (Corollary 8.14), 


J(riU---yjr K )=J(r x )-\ -h J(r k ), 

J(R\U-- -U Rk) =J{R \) H-b J(Rk), 


and the set inclusions imply 

An) + ■ • ■ +J( n ) < j(s) < j(R\) + ■ ■ ■+J(R k ). 
But J(rj) = mj Ax and J(Rj) = MjAx, so 


m\ Ax H -b wtjfAx < J(S) < M\Ax H- b MkAx. 

Thus, J(S) has the same bounds as the integral; these bounds therefore converge 
to J(S) as well as to the integral. □ 

In elementary geometry, congruent figures have the same area; we now prove 
they have the same Jordan content, too. We need to show that Jordan content is 
preserved by the translations, rotations, and reflections that link congruent figures. 
Such invariance is not immediately obvious, because we have restricted ourselves 
to a single collection of grids J k . For example, a translate of S does not, in general, 
have the same relation to J k that S itself does. However, if we translate the grid 
as well and can show that the translated grid yields the same Jordan content as the 
original grid, then it follows that Jordan content is invariant under translations. This 
leads us to the task of showing that the Jordan content of a set can be determined 


y=f(x) 



a Ij b x 


Jordan content 
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just as well by many other grids. We concentrate on grids obtained from J k by a 
Euclidean motion or, more generally, by a coordinate change. 

For simplicity, we begin with a Euclidean motion E : R 2 —> K 2 . The action of E 
is given by an orthogonal matrix R (either a rotation or a reflection) followed by a 
translation. We can write formulas for E and its inverse as 

u = E(\)=R\ + a, x = E~\u) = Jr'(u-a) =R^u + b, 

where b = — R ' a and if = R 1 because R is orthogonal. If S is a bounded set in the 
plane, then its image E (S) under a Euclidean motion stands in the same relation to 
the grid J k that S itself does to the new grid Jf k = E~ l (J k ). Each time a square Q 
in Jf k counts in estimating the area of S, its image E(Q) in J k counts in estimating 
the area of E{S). 

But there is still a problem, because we do not know that Q has the same Jor¬ 
dan content as E(Q). (This is precisely the question we are trying to settle: the 
invariance of Jordan content under Euclidean motions!) Nevertheless, Q is certainly 
Jordan measurable, because dQ is just a finite collection of line segments with Jor¬ 
dan content zero. Making no assumption about the value of J(Q), we now construct 
the analogues of J k , J k ,J, and J for the grids Jf k (cf. Definition 8.6, page 281). 

Definition 8.11 Let H_ k (S) denote the total Jordan content of the squares in 0f k that 
are entirely contained in the bounded set S, and let H k (S) denote the total Jordan 
content of the squares in 0f k that intersect S. 

We can express these very compactly using set and summation notation: 


K k (S) = X AQ), H k (S) 

Qefy 

QCS 


I AQ) 

Q£tH k 

Qnsw 


The values of Kk(S) an d HfS) are nested in the same way as the values of J k (S) 
and Ji(S); this gives us the limits 

H(S) = lim H k (S), H(S ) = lim^(5), 

k—k— 

and the inequality H(S) < Jl(S). 

H content Definition 8.12 If U_(S) = IKS), we say that S is H measurable and we define the 

H content ofS to be H(S) = H(S) = fl(S). 

Lemma 8.4. Suppose S is Jordan measurable, thenH_ k {S) <J{S) < H k (S) for every 

£= 0 , 1 , 2 ,.... 

Proof. Suppose Q\,... ,Q P are the squares of Jf k that are counted in []_ k (S). Be¬ 
cause they are nonoverlapping and Q\ U • • • U Qp S) the finite cidditivity of J im _ 
plies 

K k {s) = AQi) + ■ ■ ■ +AQ P ) =AQi u • • • u q p ) < J(S). 

The inequality J(S) < H k (S) is proven in a similar way. □ 




















8.2 Area and Jordan content 


289 


Corollary 8.17 Suppose a Jordan measurable set S is also H measurable, then 
H(S)=J{S). □ 

But must a ./-measurable set also be H measurable? The corollary directs our atten¬ 
tion to the difference H k (S) — H k (S), because 

S is H measurable <t=> lim (H k (S) —H k (S)) = 0 

(cf. the comment at the beginning of the proof of Theorem 8.2, p. 282). Therefore, 
if we can show that H k (S) — is smaller than any preassigned £ > 0 when k is 

sufficiently large, we shall have shown that H(S) = J(S) . To complete our argument, 
it is useful to introduce the notion of a tubular neighborhood. 

Definition 8.13 IfS is a bounded set, the tubular neighborhood ofS of width w > 0 

is the set ofpoints T that are within distance w of some point of S. 

The tubular neighborhood of S of width w is the union of the open disks of radius w 
centered at all the points of S. If S is a smooth curve in space, and w is small enough, 
then the tubular neighborhood looks like a tube with S at its core; together, they 
resemble a coaxial cable. 

Lemma 8.5. Suppose S has Jordan content zero and e > 0 is given. Then S has a 
tubular neighborhood T of some width 5 > 0 for which J(T) < £. 

Proof. Because J(S) = 0, we know J k (S) —> 0 as k —> Choose K so large that 
Jk(S) < e/9. The squares Q in J K that are counted in Jk(S) cover S and have total 
Jordan content less than e/9. 

Define T to be the tubular neighborhood of S of width 5 = 1 /2 K , and let q be a 
point in T. By definition, q is within distance 5 = 1/2 A of some point p in S. Let Q 
be a square counted in Jk(S) that contains p. Because the squares in ‘l K are closed 
and have width 5 = 1 /2 K , the point q is either in Q or one of the eight neighbors 
of Q. Now q is an arbitrary point of T. so T is covered by the squares Q and their 
immediate neighbors, whose total Jordan content is less than 9 x e/9 = e; hence 
J{T) < J k {T) < e. □ 

Theorem 8.18. Suppose S is Jordan measurable; then it is H measurable and 

J{S) = H(S). 

Proof. By Corollary 8.17 and the discussion following it, we need only show that, 
given any £ > 0, there is a K such that 

H k (S) — Kk(S) < e. 

Note: the sequence HffS) —H k (S) decreases monotonically as k increases, so we 
also have H k (S) —H_k(S) < e f° r all k > K- 

By hypothesis, S is Jordan measurable, so dS has Jordan content zero. Therefore, 
using the given £ > 0 and Lemma 8.5, we know dS has a tubular neighborhood T 
of some width 5 > 0 for which J(T) < £. Now choose K so that the diameter (see 


S 



tubular 
neighborhood of S 
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below, Definition 8.14) of any grid square Q in ‘Jf k is less than 8. The diameter of 
Q is its diagonal length \flj2 K < 1 /2 A ~ 1 , so it is sufficient to choose 

K> l+log 2 (l/<5). 

Consider the difference Hk(S) — Adapting the arguments of the proof of 

Theorem 8.2 from J k to Hf , we draw two conclusions: 

• Hk(S ) — []_ K (S) is the total Jordan content of the squares Q in Hk that meet S 
but are not entirely contained in S. 

• Each such square Q contains a point of dS. 

The diameter of Q is less than 8 (by construction); thus the entire square Q lies 
within the tubular neighborhood T. Hence, the total Jordan content of the squares Q 
counted in Hk(S) — H_k(S) is l ess than the outer Jordan content of T ; that is, 

H k (S) - H K (S) < e. □ 


Jordan content of 
congruent figures 


Jordan content and 
ordinary area 


General grids Q k 
and G content 


One of our main objectives, to show that congruent figures have the same Jordan 
content, now follows as an immediate corollary. 

Corollary 8.19 IfS is Jordan measurable and E : R 2 —> R 2 is a Euclidean motion, 
then E(S) is Jordan measurable and J(E(S)) = J(S). 

Proof. J(E{S)) = H(S) = J(S). □ 

Thus, if a rectangle R with sides of length l and w lies anywhere in the (x,y)~ 
plane, a Euclidean motion E will transform it into the rectangle E (R): 0 <x<L 0 < 
y < w. By Exercise 8.8, J(E (R)) = lw, therefore, J(R) = Iw. If P is a parallelogram 
with base b and height h, then P can be decomposed into nonoverlapping sets A and 
B so that A and a translate E (B) are nonoverlapping and form a rectangle R with the 
same base and height. Therefore, 

J{P) = J(A) +J(B) = J{A) +J{E(B)) = J(R) = bh. 



If T is a triangle with base b and height h, then similar geometric arguments show 
that J(T) = jbh. Because a polygon can be written as a union of nonoverlapping 
triangles, it follows that the Jordan content of any polygon equals its ordinary area. 

Under what conditions will a more general collection Q k (k = 0,1,2,...) of grids 
on the plane determine Jordan content? We assume that each grid is a refinement of 
its predecessor, and also that the individual cells P of a grid are closed, connected, 
nonoverlapping sets with positive Jordan content that together cover R 2 . The cells 
need not be congruent or even straight-sided. We define: 
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• G k (S) is the total Jordan content of the cells P of Q k that are entirely contained 
in S; 

• G k (S) is the total Jordan content of the cells P of Cj k that meet 5. 

G k {S)= X J(P), G k (S)= X AP)- 

PeQk Pegt 

pcs Pnsj<l> 

Because Q k refines its predecessor, G k (S) increases monotonically with k to its lim¬ 
iting value G(S), the inner G content of S. Likewise, G k (S) decreases monotonically 
to the outer G content, G(S), and 

G k (S)<G(S)<G(S)<G k (S). 

If the inner and outer G content are equal, we say S is G measurable and has G 
content G(S) = G(S) = G(S). Under what conditions will G(S) = ,/(.S'j? For H 
content, the answer was provided by Theorem 8.18, whose proof involved the linear 
dimensions (i.e., diameters) of the grid elements. 

Definition 8.14 The diameter of the set S, denoted S(S), is the maximum distance 
between any two points in its closure S. The mesh size \\(f\\ of a grid (f is the 
smallest upper bound on the size of the diameters of the elements P of Q. 

Theorem 8.20. If || Q k \[ —> 0 as k —> then every Jordan-measurable set S is 

G measurable, andJ(S) = G(S). 

Proof. This argument imitates the proof of Theorem 8.18 and the lemmas preceding 
it. First of all, because S is Jordan measurable, 

G k (S)<J(S)<G k (S) 

(cf. Lemma 8.4). Hence, to prove the theorem it suffices to show 

\hn(G k (S)-G k (S))=0. 

Suppose e > 0 is given; we wish to show that there is an integer K = K(e) for 
which 

G k (S)-G k (S)<e, 

for all k>K. Because S is Jordan measurable, J(f)S) = 0 and dS has a tubular neigh¬ 
borhood T of some width 8 for which J{T) < £ (Lemma 8.5). Because || Q k \\ —> 0 
as k —> °o, we can choose K so that \\(f k \\ < 8 for all k> K. Each cell P in the grid 
Cj k contains a point p in S and a point q not in S. Because P is connected, there is 
a continuous path in P from p to q, and that path must contain a point of dP (Exer¬ 
cise 8.6). If k > K, then the diameter of P is less than 8, so P is entirely contained 
within the tubular neighborhood T. Hence the total Jordan content of the cells P 
counted in G k (S) — G k (S) is less than J(T) < e ; that is. 


Qk 



Diameter 8{S) and 
mesh size \\Q\\ 
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Area magnification 
by a linear map 


The multiplier for 
Jordan content 


G k (S)-G k {S)<e. □ 

We are now able to extend to Jordan content our earlier observation about the 
area magnification factor of a linear map (Theorem 2.8, p. 42). We make frequent 
use of the fact that the Jordan content of a polygon P is its ordinary (absolute) area; 
thus, if L{P) is its polygonal image under a linear map, then 

J{L(P)) = | detZ | J(P). 

Lemma 8.6. IfL : R 2 —> R 2 is linear and J{S) = 0, then J{L{S)) = 0. 

Proof. If Z is not invertible, the proof is immediate, because the whole image Z(R 2 ) 
is a line, so L(S) is a finite line segment and automatically has Jordan content zero. 

If Z is invertible (i.e., detZ f 0), then we show that L{S) is contained in a union 
of sets whose total Jordan content is less than any positive number e. The lemma 
then follows from Corollary 8.8 (p. 283). 

By hypothesis, J(S) = 0 so Jk(S) < e/|detZ| for some integer K. That is, S 
is covered by squares Q whose total Jordan content is less than e/| detZ |. There¬ 
fore L(S) is covered by the images L{Q) of those squares. Because J(L(Q)) = 

| detZ | J(Q), the total Jordan content of the sets covering L(S) is less than e. □ 

Corollary 8.21 The image of a Jordan-measurable set under a linear map is Jordan 
measurable. □ 

We can now show that the Jordan content multiplier of a linear map is the abso¬ 
lute value of its determinant. 

Theorem 8.22. Suppose S is Jordan measurable and L : R 2 —» R 2 is linear; then 
J(L(S)) = |detZ|/(5). 

Proof. If Z is not invertible, then detZ = 0, and L(S) is a bounded subset of the line 
Z(R 2 ). Thus J(L(S)) = 0 = | detZ | J(S). 




For an invertible map Z, we adapt the argument we used to prove that a Euclidean 
motion preserves Jordan content (pp. 287-290). The key to the argument is to note 
that Z(S) has the same relation to the grid J k that S itself does to the new grid 
Q k = L~\j k ) (see the figure below). Of course, to use the G-content functions 
associated with (f k (as defined above, p. 291), we want Theorem 8.20 to hold. We 
must therefore check that the maximum diameter || Cj, || of a cell P of (f k tends to 
zero as k —> °°. Exercise 8.16 establishes that the diameters of Q and/ J = Z ] (Q) are 




















8.2 Area and Jordan content 


293 


linked by a constant a that depends only on L 1 and not on k: 8 ( P) = a 8(Q). Thus 
all cells P of Q k have the same diameter: || Q k \\ = c8(Q). Because 8(Q) = \fl/2 k , 
it follows that || Q k \\ —>■ 0 as k —> »>. By Theorem 8.20, the given Jordan-measurable 
set S is also G measurable, and/(5) = G(S). 

As P is contained in S precisely when Q = L(P) is contained in L(S), we have 
(using Z = Z~ 1 (g) as well) 


G k (S)= X J(P)= X AL-'m- 

Pe§k Qch 

pcs QCL(S) 

Because Q is a polygon, J(Lr 1 (Q) ) = | detZ~ 1 1 J(Q) , so the last sum becomes 

IdetZ-'l X AQ) = \dctL~ 1 \J k (L(S)), 

Qch 

QCL(S) 

or just G k (S) = | detZ -1 1 J k (L(S)). Because | detZ -1 1 = | detZ| _1 , we have 

s / Jt (Z(5)) = |detZ|G Jt (S). 

In the limit as k—> °°, J(L(S)) = j detZ| = | detZ | J(S). □ 

When we take up integration in the next section, we need an even larger class of 
grids than sequences ^ 0 , (j 2 , ■ ■ ■ of successive refinements of the sort we have 

considered so far. The nonnegative integers that we use to index these grids have a 
natural order that is imparted to the grids themselves: there is a “first” grid, then a 
“second,” and so on. When we say that each grid refines its predecessor, we make 
implicit use of that order. 

So when we enlarge the class of grids, such a larger collection { typically 
has no natural ordering. Thus, even though one grid in the collection may be a 
refinement of another, the notion of “predecessor” is now missing, and we are no 
longer able to say that a grid refines its predecessor. Nevertheless, we still assume the 
cells P in any grid are closed, connected, nonoverlapping sets with positive Jordan 
content, and together they cover R 2 . Then we define (exactly as we did for the more 
restricted class of grids (j k ): 

• Gg(S) is the total Jordan content of the cells P of Q that are entirely contained 
in S. 

• Gg(S) is the total Jordan content of the cells P of C] that meet S. 

Gg(S)=J J J(P), Gg(S) = X J{P) 

Peg Peg 

pcs Pns^ip 

We should next get inner and outer G content (i.e., G(S) and G(S)). When grids 
were indexed by integers k= 0, 1,2,.. we just took limits of G k (S) and G k (S) as 
k —> °° (and monotonicity guaranteed that the limits existed). But for an arbitrary 


Grids for integration 


G g(S) and G g {S) 

vary with || Q\\ 
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collection of grids { (f}, there is no index that supplies an ordering. Nevertheless, if 
we consider how Gg(S) and Gg(S) vary with mesh size || Q\\, we see that they do 
have well-defined limits, at least if S has Jordan content. We then have a way, once 
again, to define G content. 

To see how this happens, suppose S is Jordan measurable. Then, for any grid (j, 

G ? (S)<J(S)<G g (S) 

(cf. Lemma 8.4). Therefore, if we can show that 

Urn CG § (S)-G § (S))=0, 

then we know the following limits exist: 

G(S)= lim GJS)=J(S), G(S)= lim G ( fS) =J(S). 

||g||—>o y ||§ho 

We can then say 

• S is G measurable. 

• G(S) = G{S) = G{S). 

• J(S) = G{S). 

Hence, for any given e > 0, we must show there is a 8 > 0 for which 

IISII<5 =► G ff (5)-G ff (5)<e. 

By Lemma 8.5 we know that dS has a tubular neighborhood T of some positive 
width 8 for which J{T) < e. Now suppose || Q\\ < 8. Then, by essentially the same 
argument as in the proof of Theorem 8.20, the total Jordan content of the cells P that 
are counted in Gg{S) — G g(S) is less than J(T) < e. In other words, 

IISII<5 =* Gg{S)-G g {S)<e, 

as required. We state the conclusion as a theorem. 

Theorem 8.23. If{ Cj} is an infinite collection of integration grids whose mesh sizes 
\\(f\\ come arbitrarily close to zero, then G content is defined for all sets S that have 
Jordan content, and G(S) = J(S). □ 

Use area to denote As we have seen, the Jordan content of any plane figure of elementary Euclidean 

Jordan content geometry is its ordinary area. For that reason, we now go back to the simpler and 

more familiar term area. Thus, we say that S has area if it is Jordan measurable; in 
that case, the area of S is denoted A (S) = J(S). For a more general set S, its inner 
areaH(5') is its inner Jordan content./(.S'), and its outer area A (S) is its outer Jordan 
content J(S ). 

Volume in R 3 With grids of cubes instead of squares, it requires virtually no alterations to trans- 
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fer the theory of Jordan content from R 2 to R 3 . Let us assume that has been done. 
Then, following what we just did in R 2 , we call the Jordan content of a set S in R 3 
its volume, and write V(S) =J(S). In fact, we can use the same method to get the 
analogue of area or volume in any dimension. 


8.3 Riemann and Darbou integrals 

We now introduce double integrals and establish their properties; in the next chapter 
we develop methods for evaluating them. We define the Riemann integral of a func¬ 
tion z = fix.y) on a closed bounded set S that has area (i.e., is Jordan measurable). 

We assume / is bounded on S and is extended to all of R 2 by setting f(x,y) = 0 
when (x.y) is not in S. 

Let Q be a grid of the sort we considered near the end of the previous section. Integration grids 

Thus, the cells Q of Q are closed, bounded, nonoverlapping sets that have area. We 
let A(Q) denote the area of Q and 8(Q) its diameter (cf. p. 291); the diameters have 
a finite bound || Q ||, the mesh size of Q. The cells of (j must cover S, but they need 
not cover all of R 2 . Furthermore, those cells need not be congruent, nor need they 
have straight sides. We call C] an integration grid. 

Let Q\,...,Qn be all the cells of Q that meet S; we write the area A (Q,-) as A A,-. Riemann sums 

A Riemann sum for / over S is an expression of the form 


N(g) 


z 



v 


where (x/.y,-) is a point of Qi,i= 1 Note that the sum depends upon the grid 

(j and the points (x/.y,-), as well as on / and S. By writing N = N(Q) as well, we 
call attention to the fact that the number of cells that meet S depends on Q. 

To interpret such a sum it helps to let / be positive and continuous, as in the fig- Approximating volumes 
ure above. Then /(x/,y,) is approximately the height of the prism P, that has base Qu 
vertical sides, and an irregular top formed by the graph of /; /(x,-,y ; ) AA t is approx- 
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The Riemann integral 


Absolute (unoriented) 
double integrals 


imately its volume. The Riemann sum therefore approximates the total volume of 
the solid that lies above S and under the graph. To get a better aproximation, make 
the individual cells smaller; more exactly, use a new grid <f with a smaller mesh 
size || (f\\. In fact, we expect all Riemann sums will be as close as we wish to the 
actual volume, as long as the mesh size is sufficiently small, independently of the 
way the points are chosen in a grid. This leads us to the definition of the Riemann 
integral. 

Definition 8.15 If the Riemann sums for f over S have a limit that is independent 
of the points (xi,y,) as || Q\\ —> 0, then we say f is integrable overS and the integral 
is that limit: 

N(Q) 


JJs 


f(x,y)dA = lim 

s ||£|l->o 


(=i 


More exactly, this a Riemann integral, which is different from the Darboux integral 
that we introduce presently, as well as from other kinds of integrals that we do not 
consider. To make it clear that convergence to the limit is uniform with respect to the 
points chosen in the cells of a grid, we put the definition more formally, as follows. 
Given an e > 0, there is a 8 > 0 such that, for every grid Cj with || (f\\ <5, 


N(g) 

S /Ott/) m- - JJ s f(^y) dA 


<£, 


regardless of the choice of points ( xi,yi ) within the cells of (f. 

Because the domain is 2-dimensional, we call this a double integral. (A rectan¬ 
gular grid, as in the following example, provides an even more compelling reason 
for the name.) More particularly, this is an absolute, or unoriented, double integral, 
because the grid cells Q, are given no orientation and their areas A/I, = A ( Q ,) are 
always nonnegative. 


y 


1 


2 


b 


7-1 

b~, 


b 




ij 


y j 

* y j-i 


-y 2 

Tl 


a-> 


R 


I,J 


• 

• 

• 


• 


• 

• 

• 

• 

• 


• 


• 

• 





hi 




• 

• 

• 


• 

(*,. yj) 

• 

• 


A /-l 


*i -1 


1-2 “ 7-1 
























































8.3 Riemann and Darbou integrals 


297 


Here is a special class of grids that are frequently used to construct Riemann 
sums. First partition the x- andy-axes into nonoverlapping intervals: 

[a/_ 1 , a,] : a,-_i < x < a,-, 

[bj-i,bj] : bj-i <y<bj, 

Let Ax; = a, — a,_i and A yj = bj — bj-\ denote the lengths of these intervals. 
Now form the rectangles Rjj : x [bj-\,bj] ; the area of Rjj is the product 

A (Rij) = AxjAyj. These rectangles are cells of a grid that is natural to index 
with a pair of integers /../ (in contrast, e.g., to the grid J k ). 

Let Xj be a point in the x-interval [a,-_ i, ai] , and let yj be a point in the y -interval 
[bj- 1 , bj]. Then (x/.yy) is a point in Rjj, and a Riemann sum for / over S naturally 
takes the form of a double sum: 

'Z'Z,f( x ^yj) Ax ‘ A y j - 

/= 17=1 

If / is integrable, then these sums approach the integral off as the grid mesh size 
tends to zero. In that case it is natural to replace the “element of area” dA by dxdy 
and to write the limit itself as 



T ^ f{xi,yj)AxjAyj. 

7=1 


The two summation signs on the right now explain why we use a pair of integral 
signs to denote the integral, and they also suggest why we call it a double integral. 

As we have already noted, the terms in a Riemann sum are products of lengths 
f{xj,yi ) and areas AAj, so we usually think of a Riemann sum and the resulting 
integral as volumes. We pursue this below, but first we make a connection between 
double integrals and areas. 


Theorem 8.24. A constant function f(x,y) = c is integrable over every set S that 
has area, and 



cdA = cA(S) = c x areaS. 


Proof. Let { (f } be an infinite collection of integration grids whose mesh sizes || (f\\ 
get arbitrarily close to zero. Let Gg be the outer content function associated with Cj 
(cf. p. 293). Because f(x,y) = 0 outside S, by construction, the grid cells Q, of (j 
that make a nonzero contribution to a Riemann sum for / are precisely the ones that 
meet S; thus 

f j cAA i = cY J A(Qj)=cGg(S). 
i= 1 

The collection {(f} satisfies the hypotheses of Theorem 8.23; therefore, because S 
has area, it is G measurable and G{S) =A(S). Hence, Gg(S) — > A(S) as || (f\\ —> 0. 


Riemann double sums 
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Properties of 
double integrals 


Additivity and linearity 
of the integral 


This limit does not depend on the collection { <f}; therefore we conclude all Rie- 
mann sums have the same limit, namely cA(S), and / is thus integrable. □ 


Theorem 8.25. Any bounded function f on a set S of zero area is integrable, and 

JJ s f(x,y)dA = 0. 


Proof. Suppose B is a bound on /: \f(x,y)\ < B for all ( x,y ) in R 2 . Let e > 0 be 
given. By Lemma 8.5, we can construct a tubular neighborhood of S of width 8 > 0 
for which J(T) < e/B. Let Q be any grid for which || (f\\ < 5. In a Riemann sum, 
the cells Qj of Cf that meet S are contained entirely in T ; thus, for any choice of p, 
in Qi, we have 


^/(p/)2U,-° 


i=l 


N N 

< X |/(P/)|A4 <B^AAi < BJ{T) < e. 

/—1 i = 1 


This shows that / is integrable and the value of the integral is 0. □ 

Arguments similar to those in the last proof can be used to establish the follow¬ 
ing general properties of double integrals; see the exercises. These properties are 
formally the same as those of integrals of a single-variable function. 


Theorem 8.26. Suppose f and g are integrable over S; then so are cf and f ± g, 
and 

[[ cfdA = c ([ fdA, [f (f±g)dA= [f fdA± [f gdA. □ 

JJS JJs JJs JJs JJs 

Theorem 8.27. Suppose f is integrable over the nonoverlapping sets R and S; then 
f is integrable over R\JS and 

jj fdA = jj fdA + Jj fdA. □ 

RUS R S 

The second theorem says that the integral is additive over sets in the same sense 
that area (Jordan content) is; see Corollary 8.14, page 286. The first theorem says 
that the integral acts as a linear operator on functions. 

Theorem 8.28. Suppose f is integrable over S and f(x,y) > 0 on S; then 

JJ s f(x,y)dA> 0 . □ 

Corollary 8.29 Suppose f and g are integrable over S, and f(x,y) < g(x,y) for 
every point (x,y) in S; then 

JJ s f (x,y)dA < JJ s g( x >y) dA - D 
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Corollary 8.30 Suppose f is integrable over S and m < f(x.y) < Mfor every point 
(x,y) in S; then 

mA(S) < JJ f(x,y)dA<MA(S). □ 

Proof. By Theorem 8.26, g{x,y) = f(x,y) — m > 0 is integrable, and 

0< [[ g(x,y)dA = [[ f(x : y)dA— [I mdA= [[ f(x,y)dA — mA{S ), 

JJs JJS JJS JJs 

which leads to the first of the stated inequalities. The second is obtained in a similar 
way. □ 

We have a standing assumption that integration applies only to bounded func¬ 
tions. That assumption is essential in the last corollary. For example, let S be the 
unit interval on the x-axis, so A(S) = 0 as a region in the plane. Let 

/(*,>■> = 1 1/vS 0< *si.}-=0, 

I 0 otherwise. 


Now consider Riemann sums for f on S constructed with the grids J n used to define 
Jordan content. Let Q\ be the square [0,1/2"] x [0,1/2"] and let pi = (l/2 6 ",0). 
Then 

/(pi)A^i = — 1 ■ —1- = 2 3 ” ■ —Ip = 2” —> °o as 

a/ 172 5 " 2 2 " 2 2 " 

so no Riemann sum that contains this term can converge to a finite value. In other 
words, / is not integrable, even though its domain has area zero. 

We now turn to the Darboux integral. It is constructed from sums involving upper 
and lower bounds of a function on each cell of a grid. Although the Darboux and 
Riemann integrals have similar definitions (and we eventually show they have the 
same value), the Darboux integral is defined more in the style of Jordan content: 
there are analogues of inner and outer areas on a grid and inner and outer content as 
limits. 

Suppose / is bounded on S, and Cj is an integration grid whose cells Q i,..., Qn 
cover S. We do not assume / is Riemann integrable over S. Let Mi be the smallest 
of the upper bounds (the “least upper bound”) for / on Q,, and let m, be the largest 
of the lower bounds (the “greatest lower bound”). In other words, 


m < /(p,-) < Mi 


for all points p, in Q,, and these bounds are the best possible. To see what “best 
possible” means here, consider the following example: 


/M = 


x = 0, 

0 < x < 1, 


Integrate only 
bounded functions 


Bounds and the 
Darboux integral 


Least upper bound; 
greatest lower bound 



1 x 
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inf and sup 


Lower and upper 
Darboux sums 


Lower and upper 
Darboux integrals 


where Q : 0 < x < 1. Then 0 is a lower bound for / on Q, but no larger number e > 0 
is a lower bound, because we can always choose x in Q so that fife) < e (e.g., take 
x = e/2). Thus m = 0 is the greatest lower bound, which is also called an infimum. 
Because there is no “smallest” positive real number, the set of values f(x) has no 
minimum, but it does have an infimum. 

Definition 8.16 Any nonempty set of numbers Z that is bounded below has a great¬ 
est lower bound, gib Z. or infimum, inf Z; if Z is bounded above, it has a least upper 
bound, lub Z, or supremum, sup Z. 

Definition 8.17 The lower and upper Darboux sums for f over S and the grid Cf 
are, respectively, 


N _ N 

D g {f,S ) = '£m i AA i , Dg(f,S) = 

(=1 (=1 

Lower and upper Darboux sums give us lower and upper bounds on all possible 
Riemann sums that can be constructed with the grid f; that is, 

D g (f,S)<f j f(p i )AA i <D g (f,S), 

i=t 

no matter how p, is chosen in Q,. The following lemma, which says no lower sum 
is larger than any upper sum, plays the same role here that Lemma 8.2, page 281, 
does for the inner and outer area estimates of Jordan content. 

Lemma 8.7. For every pair of integration grids Of and Cf, 


D^{f,S)<D g {f,S). 


Proof. We construct the common refinement, Of, of Of and Cf. The cells of Of 
consist of the intersections PP\ Q, where P is a cell in Of and Q is a cell in Cf. Then 
Of does indeed refine both Of and Cf, so the usual arguments about refinements 
imply 

DM,S)<D K {f,S)<D x {f,S)<Dg{f,S). □ 

Thus each upper sum is an upper bound for all lower sums, and each lower sum is a 
lower bound for all upper sums. Consequently, the following least upper bound and 
the greatest lower bound are well-defined. 

Definition 8.18 The lower Darboux integral D(f,S) of f over S is the least upper 
bound of the numbers Dg{f,S), over all grids Cf. Similarly, the upper Darboux 
integral D(f ,S) is the greatest lower bound of the numbers Dg(f,S). 

Theorem 8.31. D(f,S) < D(f,S). 

Proof. Choose f arbitrarily; by Lemma 8.7, Dg (/, S) is an upper bound for all 
possible lower sums, so it is at least as large as their least upper bound: 
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D(f,S)<Dg(f,S). 


By this inequality, the lower integral D(f, S) is a lower bound for all possible upper 
sums (because Cj is arbitrary), so it is at least as small as their greatest lower bound: 

D(f,S)<D(f,S). □ 

Definition 8.19 If If f- 8 ) = D(f,S), then f is Darboux integrable over S, and its 
Darboux integral is D(f.S) = D(f,S) = D(f,S). 

The next two theorems establish that the two notions of integral are equivalent. 

Theorem 8.32. If f is Riemann integrable on S, then it is also Darboux integrable, 
and the two integrals are equal. 

Proof. Because / is bounded, its upper and lower Darboux sums are defined for all 
grids. To prove the theorem, it is enough to show that, for any given e > 0, there is 
a grid j for which 

Dg(f.S)-D (] {f,S)<e. 

Because e > 0 is arbitrary, it then follows that D(f, S ) — D(f, S) = 0 and that / 
is Darboux integrable. Moreover, because every Riemann sum is trapped between 
D g(f, S) and Dg(f,S), so is the Riemann integral. The Darboux integral is trapped 
the same way, so the two integrals must be equal. 

Using the given e > 0 and the hypothesis that / is Riemann integrable, choose 
5 > 0 so that, for every integration grid j with || (j\\ < 8 , 


^/(pi)M- [[f(x-y) dA 

i= 1 JJs 



regardless of how p, is chosen in the cell Q, of the grid Cj (cf. Definition 8.15). What 
we take from this is the fact that the difference between any two Riemann sums for 
/ with the grid (f is less than e/2. 

Fix a grid Cj for which \\(f\\ <8, and let Q\ ,..., Qn be the cells of Cj that meet S. 
Construct the the lower and upper Darboux sums 


N _ N 

D g {f,S) = £m,AA„ Dg(f,S) = ^M,AA h 

i= 1 i=i 

in the usual way, and set 

N 

A = ^ j AA i = Gg(S), 

i=i 

the outer G content of S with respect to the grid (j. 

Because m, is the greatest lower bound of /(x) on g„ m,- + e/4 A is not a lower 
bound. In other words, there is a point p, in each Q, for which 


f(p i )<m i +—. 


The Darboux integral 
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Therefore, 

N N £ N £ 

X f(Pi)AAi < X mAA i + -J j AA„=D g (f,S) + -. 

i= l i=l qA i=i 4 

In a similar way, there is a point q, in each Q, for which 




and a similar argument shows that 

Z^(/,S)~<£/(q,)AT,, 

4 /—l 

Subtracting the first inequality from the second, we find 

D (j (f.S)-D g {f.S) ~ E -< £/(q,) AAi - £/(p,') AA t < 

z (=t /—l z 


The last inequality in this sequence is just the fact that any two Riemann sums differ 
by less than e/2. Hence D g (f,S) — Dg^f.S) < e; by what was said above, this 
completes the proof. □ 

Theorem 8.33. If f is Darboux integrable on S, then it is also Riemann integrable, 
and the two integrals are equal. 

Proof. Let D be the value of the Darboux integral of / on S. We must show that, for 
any given e > 0, there is a 8 > 0 so that, for any integration grid Cj with || (jj| < 5, 


N 

X/(p«)A^/-£> 


j=i 


<e, 


regardless of how the point p, is chosen in the cell Q, of Cf. 

Every Darboux integrable function is bounded, by definition; choose B so that 
\f(x,y)\ <BonS. The definition also implies that upper and lower Darboux sums 
for / get arbitrarily close to D. Thus, for the £ given above, we can select a particular 
grid Hi for which 


D- £ -<D^{f,S) and D H {f,S)<D+^. 

Suppose the cells of Of that cover S are P \,... ,Pj. Let dP denote the set of boundary 
points of all these cells. Because each Pj has area, A(dPj) = 0 and thus A(dP) = 0. 
By Lemma 8.5, page 289, dP has a tubular neighborhood T of some width 8 > 0 
for which 

7 < r > < w 
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Now let Cj be any integration grid for which || Q\\ < 8. We claim that 

D — e < D g (f,S) and D g {f,S) <D + e. 

Every Riemann sum constructed with the grid (j lies between D g (/, S ) and D g (/, S ); 
thus it follows from the claim that 


£>-£<X/(pOM- <D+e, 

i= 1 

which is equivalent to what we need to prove. 

We now prove the first of the inequalities in the claim; the second can be proven 
by essentially the same argument. We divide the cells of C] into two classes, as 
follows. 

• R \,.... Rk lie entirely within the tubular neighborhood T. 

• Q\ , • • •, Qn contain points outside T. 

Now let % be the common refinement of 'M and Cj; by definition, the cells of % 
are Pj n Q, and Pj fl R/. But because the diameter of each Q, is less than 8, Qj does 
not meet dP, and thus lies entirely in a single cell Pj of 9~(. In other words, Pj (1 Q, 
is either empty or it is just Qj. The cells of % are therefore 


Proving 

D-e<D g (f,S) 


Qi,...,Qn, and 


PiHRu 
Pi nt?i, 

PjPR\, 


Pi 

Pi PR-Ki 
PjPRk- 


We have 


I A(PjnR k ) =A(R k ), X A(R k ) <J{T) < —. 

j= 1 Ii= 1 

We now construct the Darboux lower sums associated with (j and %. For this 
we need the greatest lower bound of / on each cell of each of these grids: 

nn= inf/(p). m k = inf /(p), m jk = inf /(p). 
pea P eR k p£PjnR k 



Then 


N K J 

D K (f,S) = ^mjA(Qj) + XZ m jk A(PjPR k ), 
i= 1 k=l y=l 

N K N K J 

D g (f,S) = Z in jA(Qj) + X m k A(R k ) = J j m i A(Q i ) + ^ Z m k A{PjnR k ). 

i= 1 k=\ i— 1 k=\ 7=1 


Subtracting, we find 
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The Riemanrr 
integral 


D x (f,S)-D (j (f,S)=f j ^(m Jk -m k )A(pjnR k ) 

k= 1 j= 1 

<25 i i H**) < 2 B £A(R k ) <2B~ = §, 

k= l y=i *=i 40 z 


0T 12x(f’S) < Dg{f,S) + e/2. Because is a refinement of.?/, 

DM,S)<D x (f,S). 

We thus have a sequence of inequalities, 

0 - | <D H (f,S) <D x (f,S) <Dg(f,S) + 1 , 

that together establish the first claim, D — e < 2 ? (/,S). □ 

Darboux The common value produced by the two definitions is sometimes called the 
Riemann-Darboux integral. However, we usually just call it the integral, and em¬ 
ploy the two definitions interchangeably, depending on which is more useful in a 
particular situation. For example, the proof of the next theorem uses the Darboux 
characterization of the integral. 

Theorem 8.34. Suppose f is integrable on S; then so is |/| and 


JJ s f(x,y)dA < JJjf{x,y)\ 


dA. 


Proof. First we prove |/| is integrable; it is enough to show that, given any e > 0, 
there is a grid Of for which 


dm,s)-dm,s)<£. 


To analyze the upper and lower Darboux sums for both |/| and /, let Q\ ,..., Qs be 
the cells in an arbitrary grid Cj, and let 

m* = inf |/(p)|, ny = inf /( p), 
pea pea 

M* = sup |/(p) |, Mj = sup /( p) . 
pea pea 

Now it is always true that M* — m* < M t — m, (see Exercise 8.20); thus, for any 
grid g, 


N 


Dg(\f\,S)-Dg(\f\,S) = Z(M* 


i=i 

N 


-m*)AA h 


< X {Mi - m,)AAi = Dg (/, S) — Dg (/, S). 

i=i 
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Because / is integrable, by hypothesis, there is always a grid for which 


D H {f,S)-D H {f,S)<£- 

then1/1,5) — D ^(|/|,5) < e as well, so |/| is integrable. 

Finally, because — \f(x,y)\ < f(x,y) < \f(x,y)\ and the integral is monotone (cf. 
Corollary 8.29), 


- [[\f{x,y)\dA< ((f(x,y)dA< ff \f(x.y)\dA. □ 

JJs JJs JJs 


One of the fundamental results of calculus is that a continuous function is inte¬ 
grable. However, because we extend every function to the whole plane by setting 
it equal to zero outside its given domain, even a continuous function becomes, in 
general, discontinuous across the boundary of that domain. This causes difficulties 
in proving integrability, because some cells of a grid straddle the boundary; in those 
cells, the function can take widely different values, even if the cell is small. In prov¬ 
ing that a continuous function is integrable, we, however, take all this into account, 
and even allow certain other discontinuities in the function. 

Theorem 8.35. Let Z be a subset ofS with A(Z) = 0. Iff is bounded and continuous 
on S\Z, then f is integrable on S. 

Proof We show that, given any e > 0, there is a grid J k (one of the original grids 
of congruent squares) for which 

D h (f,S)-D h (f,S)<e. 

Let B be a global bound for/; that is, \f(x,y)\ < B for all points (x,y) inM 2 . Let T 
be a tubular neighborhood of ZU dS of positive width, chosen so that./( T) < e/ 4 B. 



The integral of a 
continuous function 


Now T contains all points of discontinuity of /, so / is continuous everywhere on 
5\ T. Furthermore, T is open so S"\ T is closed and bounded, implying / is uniformly 
continuous there. Thus, for the given e, we can choose a 8 > 0 so that if p and q are 
in S \ T, then 
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Up— qll < <? => l/(p)-/(q)l < 

Here A(S) is the area of S; by Theorem 8.25, we may assume that A(S) > 0. 

Now consider any grid J k for which \\J k \\ < 8 {k > 1 + log 2 (l/5) suffices), and 
divide the squares of J k into two classes: 

• Q \, • • •, Qn lie entirely within S\T. 

• meet the tubular neighborhood T. 

Let 


m,= inf /(p), m t = inf/(p), 

peg; P&Ri 

Mi = sup /(p), M, = sup /(p). 

peg; peR/ 

Then the difference between the upper and lower Darboux sums over J k is 

N L 

D Jk (f,S) -D h {f,S) = £(M, - m t )A{Q t ) + £ (M, - m,)A(R,). 

i= 1 /= 1 

Consider the first sum on the right. Because / is continuous on each closed bounded 
set Qi (because Q, lies entirely in S \ T), Q, contains points q, and p, at which / 
attains its supremum and its infimum, respectively: 

M=/(qO> mi=f(pi). 

But because Qi has diameter less than 8, we have ||q, — p,|| < 8, so 
Mi - mi =m-f{ ViX^sy 

Therefore, 

N N _ N 

^(Mi-miUm = X (/(qi)-/(P i))A(Qi) < !>(&•). 

1=1 1=1 ZA[Z) /=1 

All the squares Qi lie entirely within 5; thus their total area is not greater than A (5), 
implying 

ZA(Qi)<A(S) and f,(M, - m,)A(Q,) < ^A(S) = |. 

Now consider the second sum, the one involving the cells R/. Because Mi and 
—mi are both bounded by B, 
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L L 

X (Mi — fhi)A(Ri) < 2B'^A(R/). 

1=1 i=i 

Because the squares R / cover T and they all meet T, they are precisely the squares 
involved in computing the outer area of T with the grid J k : 

L 

^A(R,)=J k (T). 

1=1 

By the definition of outer Jordan content, we know J k ( T) decreases monotonically 
to J(T) as k —> oo. Because J(T) < e/45 by construction, we must have Jk{T) < 
e/45 as well when K is sufficiently large, implying 

J j (M 1 -m l )A(R l )<2B~ = £ -. 

Therefore, if k > K and k > 1 + log 2 (1 /8 ), then 

D Jk (f,S)-D lk {f,S)<^ + ^= £ . □ 


For functions of a single variable, there is an analogue of the preceding theorem 
that is particularly useful because it provides us with a large class of integrable 
functions. For example, it implies that a function v = f(x) with only a finite number 
of finite jump discontinuities is integrable. To prove it, just adapt—and simplify— 
the preceding proof; see Exercise 8.18. 

Theorem 8.36. Suppose f(x) is bounded and continuous on a closed interval [a,b\ 
minus a finite set ofpoints; then f is integrable on [a, b\. □ 


The proof of Theorem 8.35 also implies that restricted Riemann sums, using 
only the cells contained in the interior of the domain of integration, have the same 
limit as unrestricted sums. The following corollary provides the details. 

Corollary 8.37 Suppose f is bounded and integrable on S; then 


JJ s A x ,y)d A 


->°QjC°S 


where the sum is taken over onlv those cells Qj of Q that lie within °S, the interior 
ofS. 

Proof. Let e > 0 be given. Then, because / is integrable over S, there is a 8 > 0 
such that any grid Q with || Q\\ < 8 has 


Integrating 

single-variable 

functions 


Restricted 
Riemann sums 


JJ s f(x,y)dA 


I f{?i)A{Pi) 

PflS 
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(The sum on the right is an ordinary, unrestricted Riemann sum over all cells /) of 
Cj that meet S.) Suppose \f(x,y)\ < B for all (x,y) in S. Let T be a tubular neigh¬ 
borhood of dS for which J(T) < e/2B, and suppose that T has width w > 0 (cf. 
Definition 8.13, p. 289). Further restrict (f, if necessary, so that || (f\\ < w; then all 
the cells P, of Cj that appear in the unrestricted Riemann sum above are contained 
in SU T. Divide these cells into two classes: 

• Q\ j • • •, Qj lie entirely within S\dS = °S. 

• R{,.... Rk meet dS (and hence lie entirely within T). 

Then, for any r/. in R k , k= 1,... ,K, 


X/(r k)A(R k ) 

Rk 


<J,\f(r k )\A(R k )<B^A(R k ), 

R k R k 


but because R\ U... Rk C T, we have 


Y J A { R k)<J{T)<-^ and 

R k 


'Zf{r k )A{R k ) 

R k 



Consequently, for any q, in Qj (J = 1,..., J), 


Integrals as volumes 


JJ s f( x ,y)dA - ^/(q j)A(Qj 

Jj s f l ' x -y')dA ~ 'Lf(Pi) A ( p i) 


X/(r k )A(R k ) 
R k 


<e, 


where p, = q j when P, = Qj and p, = r/, when P, = /4- This proves that the integral 
is the limit of restricted Riemann sums. □ 


Theorem 8.35 also provides a way to connect double integrals to volumes. In 
the following theorem, the volume of a solid W is its 3-dimensional Jordan content, 
denoted V(W). We make use of the analogue of Theorem 8.9, that the volume of the 
rectangular parallelepiped [a, a + /] x [b,b + w\ x [c, c + h] is the product Iwh. 


Theorem 8.38. Suppose / > 0 is bounded and integrable on a closed bounded set S 
that has area, and W is the solid region in R 3 that lies between S and the graph of 
z = f(x,y). Then W has volume, and 

V (W) = jj s f(x,y)dA. 


Proof. We show that, given any e > 0, there is a grid J k of squares in the plane for 
which 


D Jk (/, S)~e<V(fV)<F(fV)< D Jk (/, S). 
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(Here V_(W) is the inner volume of W —that is, the 3-dimensional inner Jordan 
content J(W); similarly ,V(W) =J(W) is the outer volume.) Because / is integrable 
over S and e can be any positive number, the inequalities show that V(W) exists and 
equals the integral of / over S. 

We begin by choosing a global bound B for /: \f{x,y) \ < B on R 2 . Then, because 
A(dS) = 0, we can choose k so large that the squares R\,... ,Rl of J k that meet dS 
have total area less than e/B. Let Q\,...,Qn be the remaining squares of J k that 
meet S; they lie entirely within S\ dS. Let 

nn= inf /(p), m;=inf/(p), 

pe& P6«; 

Mi = sup /(p), M, = sup /(p). 
v eQi peR, 

Taking into account the fact that every mj <B and that the total area of the squares 
Rj is less than e/B, we find that the lower Darboux sum for / over S is 

N L N L 

+ < Y. m i A(Q l )+BY J A(R l ) 

i= 1 /= 1 (=1 /—1 

N £ N 

< S m tAQi)+ B ■ r = S m iAQ ‘) + e > 

(=1 u 1=1 

or 

DjJJ-.S)-e<t m,A(Qi) = £ V(P>). 

i=i i=i 

In the last sum, P, is the parallelepiped with base Q, and height in,; its volume is 
miA(Qi). These parallelepipeds are nonoverlapping sets that are entirely contained 
in W, so their total volume is not larger than the inner volume of W : 

^V{P/)<V{W). 

1=1 

This gives Dj k (fiS) — e < V(JV), the first of the two inequalities we must establish. 

The second inequality is more straightforward. In the formula for the upper Dar¬ 
boux sum, 

D h (f,S) = t M 'A(Qi) + f J ^lA(R l ), 

i=i /=i 

each term is the volume of a parallelepiped based on one of the squares Qi or R/. 
These parallelepipeds are nonoverlapping and their union entirely contains W. Con¬ 
sequently, their total volume Dj k (/, S) is at least as large as the outer volume of W : 

V{W)<D h (f,S). □ 
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Triple integrals over 
3-dimensional regions 


Set functions 


Example: mass and 
mass density 


With cubes replacing squares, we can define and calculate the Jordan content of 
a region D in R 3 (cf. p. 295). Then, modifying the exposition at the beginning of this 
section by using a grid Q whose cells are cubes Q, instead of squares we can define 
the Riemann triple integral of a function f(x,y,z) over a 3-dimensional region D as 


JJJ f(x,y,z)dV 

D 


N(g) 

Jim X f(xi,yi,zi)AVi, 
<?IM 


where A Vi = J{Qi), the Jordan content, or volume, of Q,. Compare this to Defini¬ 
tion 8.15 for double integrals. All the theorems and corollaries of this section have 
natural extensions to triple integrals. In particular (cf. Theorem 8.35), a function 
that is bounded and continuous on a region D\Z, where D has volume and Z has 
volume zero, is integrable. Having made these observations, we now assume that 
triple integrals are available for our future work. 


Jordan content is an example of a set function: it assigns a real number to each 
of the sets in a certain collection. There are numerous other examples, including 
integrals themselves. In many cases, a set function even has a derivative. We end this 
section by showing that the derivative of a suitable set function is a point function 
whose integral equals the original set function. This is, in fact, a version of the 
fundamental theorem of calculus. To fix ideas, we first explore some examples. 

Imagine a thin flat plate that lies over a portion of the (x,y) -plane, and suppose 
it has a continuous but nonuniform mass distribution. Let S be a subset of the plane 
with positive area, and let M(S) be the total mass of the portion of the plate that lies 
over S. If A (S') is the area of S, then 


M(S) 

MS) 


average mass density over S. 


Intuitively, the mass density p (x,y) of the plate at the point (x,y) should be the limit 
of M(S)/A(S) as the set S “shrinks down” to (x,y), in the sense that 8(S) —* 0 for 
sets S that contain (x,y); 8(S) is the diameter of S (Definition 8.14, p. 291). Thus, 
mass distribution is a set function, mass density is a point function, and the second 
is the derivative of the first. That is, 


mass density at (x.y) = p {x,y) = M'(x,y) 


lim m, 

S(S)~>o A(S ) 


for (x,y) in S, if the limit exists. A related example is a 3-dimensional solid with 
a continuous but nonuniform mass distribution. Let D be any region of positive 
volume V ( D ) in M 3 that contains the point (x.y.z), and let M(D) be the mass of the 
portion of the solid that lies in D\ then 


p(x,y,z) =M\x,y,z) 


lim m 

S(D)~* o V{D) 
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is the mass density of the solid at (x,y,z), if the limit exists. 

Another physical example is the hydrostatic force a liquid applies to the walls of 
its container. The force F(S) on any portion S of the surface of the container is a set 
function. If S has area, then F(S) /A(S) is the average pressure (force per unit area) 
on S; as S shrinks down to a point, this ratio approaches the pressure at that point. 
Electric charge on a plate, and the related charge density, show that a set function 
can take negative, as well as positive, values. 

For a different kind of example (using subsets of R instead of R 2 , for simplicity), 
suppose A is a random variable (cf. p. 20) that takes real values. For any subset S 
of R that has a length L(S), we define 

P(S) = probability that X lies in S. 

Probability is a set function. The corresponding probability density function p{x) 
should be the limit of P(S)/L(S) as S “shrinks down” to x. A common example is 
the normal density function 


P'(x) =p(x) 


e -F / 2 

\fln ' 


which determines the normal probability function 


P(S) 


y/2n 


Js‘ 


2 ' 2 dx 


Here we find a set function that is the integral of its derivative. 
Integrals provide a very general class of set functions. Define 


F(S)= JJf(x,y)dA , 

where f(x,y) is a fixed function that is bounded and continuous on some fixed open 
set £2 in R 2 . Then F is a set function; it assigns a real number to each subset S of £2 
that has area. 

Let (x,y) be a point in £2, and let S„ be a collection of closed subsets of £2 with 
positive area that all contain (x,y). Let m n and M„ be the minimum and maximum 
values, respectively, of / on S„. Then, by Corollary 8.29, page 298, 


m„ < 


ns«) 

A(S „) 


<M n . 


Suppose 8{S„) —> 0 as n —> °°. The continuity of / implies that m n —> f(x,y) and 
M„ — > f(x,y ) as n —> °°. In other words, 


lim 


F{S„) 

A(S n ) 


Example: probability 
and probability density 


Example: set functions 
of integral type 








312 


8 Double Integrals 


A set function may not 
be of integral type 


Because this limit is independent of the choice of the sets S„ used to compute it, we 
define it to be the derivative of F at (x.y), and write 


F'(x,y) = lim 


HSn) 

A{S„) ■ 


Although F is a set function, its derivative F' = f is a point function. We call F a 
set function of integral type. The following theorem summarizes our observations. 

Theorem 8.39. A set function of integral type has a deriviative, and the set function 
is equal to the integral of its derivative. □ 


Note that continuous mass distributions and normal probability are both set func¬ 
tions of integral type: 


M(S) = 

JJ^p(x,y)dA, 

M'(x,y) 

= p(x,y); 


r e"* 2 / 2 

P'(x) 

e - * 2 / 2 

m= 

Js fin dXl 

V2n ' 


But not all set functions are. Here is a simple example to the contrary. Let 


M(S) 


1 if 5 contains the origin, 
0 otherwise. 


You can think of M as the set function associated with a unit point mass concentrated 
at the origin on K. To show M is not of integral type, suppose the contrary. That is, 
suppose 

M(S) = j g(x) dx 

for some integrable function g(x) (that need not even be continuous). Now suppose 
Q\ = [—1,0], 02 = [0,1], and S = [—1,1]; by definition of M, 

M(Q l )=M(Q 2 )=M(S) = l. 

However, by assumption we have 

M(S)= [ g(x)dx = [ g(x)dx + [ g(x)dx = M(Qi)+M{Q 2 ) =2, 

J -l J-i Jo 

a contradiction. The contradiction arises because the integral is additive on nonover¬ 
lapping sets (Theorem 8.27, p. 298), but M is not: 

M{SiUS 2 )fM{Si)+M{S 2 ). 

A set function cannot be of integral type unless it possesses, at the outset, all the 
relevant properties of a Riemann integral. 
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Exercises 

8.1. a. Adapt the program that estimates the gravitational field of a large plate 
(p. 271) to a version of basic or a similar language and use it to reproduce 
the table of values of the field that are found in the text, 
b. In the original computation of the gravitational field, we assumed the plate 
density was constant and took 4Gp = 1. Recompute all the tabular val¬ 
ues assuming that 4Gp = 1/(1 +x 2 +y 2 ). This provides estimates for the 
double integral 



0 <x<R, 
o <v<R 


c. In which case is the plate less massive, and in which case is the gravita¬ 
tional field weaker? Does the less massive plate have the weaker field? 

8.2. In Chapter 9.3, pages 342-343, the area of the curved region 1 <x 2 -/ < 2, 
1 < 2 xy < 3, in the (x,y)-plane is given by the double integral 



1<h<2, 

1<v<3 


a. Approximate the integral by a Riemann sum using a 2 x 4 grid of squares 
with the integrand evaluated at the center of each square. Use a modifica¬ 
tion of the BASIC program in the previous question to show the value of 
the Riemann sum is 0.204 806. 

b. Obtain additional approximations using a 20 x 40 grid and a 200 x 400 
grid. How close are these to the estimate 0.205 213 found anlytically on 
page 342? 

8.3. Adapt the previous basic program to estimate the value of the integral 



dxdy 


on the squareS: 0.2 <x< 1,0.2 <y< 1 (cf. Exercise 9.38.c,p. 384). Evaluate 
the function at the center of each grid square. Show that, with a 4 x 4 grid, the 
value is 2.0992 and with a 20 x 20 grid the value is 2.1156. How large must 
the grid be to make the value 2.11626? 

8.4. Is the interior of the complement of S equal to the complement of the interior 
of S? If not, does either of these sets always contain the other? 

8.5. Suppose b is a boundary point of S. Show that every open disk centered at b 
contains at least one point p in S and also at least one point q that is not in S. 





314 


8 Double Integrals 



8 . 6 . 

8.7. 


8 . 8 . 

8.9. 

8 . 10 . 

8 . 11 . 

8 . 12 . 


8.13. 

8.14. 


8.15. 

8.16. 


Suppose p is a point in S and q is a point not in S. Show that at least one point 
on any continuous curve from p to q is in dS. 

Let Q be a square in the grid j k , and let m > k. Show that 

1 - 1 2 m ~ k 1 

■Lm(Q) = 22k an d Jm(Q) = 22k 2 2m + ^2 2 ^' 


Conclude that Q is Jordan measurable and J(Q) = \/2 lk . 

Show that the rectangle a<x<b,c<y<d has Jordan content (b — a){d — c). 

Suppose 8 > 0, a8 < a < (a + 1)5, and (j8 - 1)5 < b < f58. Show that 
25 <b — a implies (a + 1)5 < (/3 — 1)5. 

Show that J_{S) = 0 O °S = 5 (i.e., the interior of S is empty). 

Suppose S is Jordan measurable and °S C T C S. Show that T is Jordan mea¬ 
surable and J{T) = J{°S) = J(S) = J(S). 

Suppose R, S, and T are Jordan measurable; show that 


J(RUSUT) = J{R)+J(S)+J{T) 

-j(Rns)-J(snT)-j(TnR)+J(RnsnT). 


This includes showing that all the sets on the right-hand side of the equation 
are Jordan measurable. 

Generalize the result in the previous exercise to four sets, and then to p sets 
S\,... ,S p . 

Let S and T denote Jordan-measurable bounded subsets of the plane. 

a. Give an example in which d{T \ S ) has a point in dS that is not in dT. 

b. Prove that d(T\S) C dTUdS. 

Modify the proof of Lemma 8.5 so that it works with cubes and sets in R 3 . 
Let the linear map L : R 2 —> R 2 be given by the matrix 



and suppose Q is the square 0<x<w, 0 <y <w. Show that the ratio cr = 
8{L{Q))/8{Q) of the diameters of Q and its image is the larger of the two 
numbers 

a 2 + b 2 + c 2 + d 2 ± 2(ab + cd) 

7 ! ’ 

and thus depends only on L, not on Q. Make a sketch to illustrate how these 
numbers are connected to L(Q). 
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8.17. Confirm (e.g., by writing suitable programs) that the inner and outer areas, J k 
and Jk, of the unit disk have the values indicated in the following table. 


k 

Inner Squares 

Outer Squares 

Ik 

Jk 

0 

0 

12 

0 

12 

1 

4 

24 

1 

6 

2 

32 

68 

2 

4.25 

3 

164 

232 

2.5625 

3.625 

4 

732 

864 

2.859375 

3.375 

5 

3 080 

3 340 

3.007813 

3.261719 

6 

12596 

13112 

3.075 195 

3.201172 

7 

50920 

51948 

3.107910 

3.170654 

8 

204836 

206888 

3.125 549 

3.156860 

9 

821424 

825524 

3.133484 

3.149124 


8.18. Prove Theorem 8.36. 

8.19. Let S be the unit square in R 2 , and let 

. I 1 if x and v are irrational, 
f(x,v) = < 

otherwise. 

For an arbitrary grid (j determine the upper and lower Darboux sums Dg (/, S) 
and Dg (/, S) . What are the values of the upper and lower Darboux integrals 
of / on 5? Is / Darboux integrable on 5? 

8.20. Suppose / is integrable on a closed cell Q, and 

= inf |/(p)|, m = inf/(p), 
peg peg 

M* = sup|/(p)|, M = sup/(p). 
pee peg 

Show that M* —m* <M— m. 

8.21. Suppose a thin flat plate is a disk of radius R centered at the origin of R 2 . 
Suppose its mass distribution is circularly symmetric and that the mass of the 
disk of radius a centered at the origin is a/(l + a), for every 0 < a < R. 

a. What is the mass of an annulus whose radii are a — Ar/2 and a + Ar/2 ? 

b. What is the mass M(S) of the piece S of this annulus cut off by radial lines 
9 = b — AO/2 and 9 = b + A9/2. What is the area ^4(5). 

c. Determine the mass density at the point (a, b) on the plate as 
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d. Using p, verify that the mass of the disk of radius a has the value it should; 
that is, verify 



x 2 +y 2 <a 2 

e. Repeat all the previous analysis assuming that the mass of the disk of ra¬ 
dius a centered at the origin is just a. 

8.22. Let 5 be a closed bounded set with area in the (x,_y)-plane. The moment of S 
about the y-axis is the integral 



Estimate the moment of the square .S': 0.2 <x< 1,0.2 <y < 1 about the y-axis 
by adapting the basic program of Exercise 8.3, above. Use a 4 x 4 grid and 
a 20 x 20 grid. 



Chapter 9 

Evaluating Double Integrals 


Abstract Although the definition of the integral reflects its origins in scientific 
problems, its evaluation relies on a considerable range of mathematical concepts 
and tools. Most fundamental is the change of variables formula; the single-variable 
version (‘^-substitution”) is perhaps the core technique of integration in the intro¬ 
ductory calculus course. By contrast, the method of iterated integrals has no single¬ 
variable analogue; it evaluates a double integral by “partial integration” of one vari¬ 
able at a time. This chapter connects double and iterated integrals, establishes the 
change of variables formula, and discusses Green’s theorem as a tool for evaluating 
double integrals and as a reason for orienting them. 


9.1 Iterated integrals 


We define iterated integrals in their own terms, independently of double integrals. 
First, suppose that S is the region in the (x,y)-plane that lies between the graphs 
y = y(x) and y = 8(x) when a < x < b. We assume y and 8 are continuous and 
y(x) < 8 (x) everywhere on this interval; we can write 

a < x < b, 

7 (x)<y<S(x). 

Now let f{x,y) be a continuous function on S; for each x in [a,b], compute the 
“partial integral” of f(x,y) with respect toy from y(x) to 5(x): 

[8(x) 

Fi(x)= / f(x,y)dy, a<x<b. 

Jy(x) 

This is a continuous function of x. As an example, let f(x,y) = x 2 y 3 and let S be 
the region between y = y(x) = 1/2 and y = 8 (x) = fx when 1 /4 < x < 1. Then the 
partial integral is 


Partial integration 



J.J. Callahan, Advanced Calculus: A Geometric View, Undergraduate Texts in Mathematics, 
DOI 10.1007/978-1-4419-7332-0 9, © Springer Science+Business Media, LLC 2010 
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x 2 / 


vs 

1/2 


x 2 (yT) 4 


4 


x 2 /16 

4 


x 


2 


4 64' 


We can reverse the roles of the two variables if we start with a region T described 
in the following way, 


c <y < d, 

a(y) <x< /30). 


Then, for eachy in the interval [a, j3], the partial integral of f(x.y) with respect to x 
from a(y) to j8(y) is 



c<y<d. 



Iterated integrals 


This is continuous if a and /3 are. It can happen that a particular region can be 
described both ways. This is true for our example above: 


1/4 < x < 1, 
1/2 <y< \/x; 


and also S : 


1/2 <y < 1, 

y 2 <x< 1. 


Therefore 



x 3 y 3 


3 9 

y-y 


y 2 


so the two partial integrals of x 2 v 3 are certainly different; they are even functions of 
different variables. 

Now integrate each of the partial integrals F 2 (x) or F\ (y) over its own domain: 


r° r° / \ 

/ F 2 (x)dx= I I / f(x,y)dy) dx, 
Ja J a \J y{x) J 

rd rd / fp(y) \ 

/ F\{y)dy= / (/ f{x,y)dx)dy. 
Jc Jc \J a(y) J 


The iterated integrals 
have the same value 


Note that in each we have performed a repeated, or iterated, integration of the orig¬ 
inal function f(x,y), first with respect to one variable and then the other. These are 
the iterated integrals of f{x,y). 

To illustrate, let us return to our example f{x,y) = x 2 y 3 over the region S. We 
have 



459 

10240 


when the iteration is performed in one order, and 



459 

10240 
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when it is performed in the other. These calculations suggest a general result: the two 
iterated integrals are always equal (Corollary 9.4, below). As we show, this happens 
because the iterated integrals, taken in either order, equal the double integral. Here 
is the statement and proof when the domain S is a rectangle. 

Theorem 9.1. Suppose f(x.y) is continuous on the rectangle R defined by a<x<b, 
c <y < d; then 


j/ R f( x -y) dA = J yj c f( x ^y) d yj dx = J c yj a dx J d y- 

Proof. We prove the double integral equals the first of the two iterated integrals; to 
show it also equals the second, interchange x and y. Let 

rd 

Fiix) = J f(x,y)dy; 

then we show 

[[ f{x,y ) dA= [ F 2 (x) dx 
J Jr Ja 

by proving that, for any e > 0, 

[[ f(x,y)dA - / F 2 (x)dx 
JJR Ja 


< e. 


We begin by subdividing R with a grid of congruent rectangles. For positive 
integers I and J, let 

b—a d-c 

Ax = -, Ay = 


J 


and then define 


x\ = a + Ax, y\ = c + Ay, 

Xi=x t - i+Ax, i = 2 ,...,/, yj =yj_ } +Ay, j = 2,...,J. 


yj 

yyj-i 

' y j-2 

-t 2 

-Tl 


R 


A y 


Ax 


x l x 2 x 3 


2 ■*■/— l ^I 
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Each point (x,-,yy) is the upper right corner of its cell. The mesh size of this grid, 
5 = sj (Ax) 2 + (Ay) 2 , can be made as small as we wish by choosing both / and J 
sufficiently large. 

Now suppose e > 0 is given. Because / is continuous on R , it is integrable there 
(Theorem 8.35, p. 305). Consequently, all Riemann sums constructed using a grid 
with a sufficiently small mesh will be arbitrarily close to the value of the double 
integral of / over R. Therefore, we can choose / and J so large that 

f(x,y) dA - £ £ f(xi,yj) Ax Ay 
t=t 7=1 

The continuous function 7*2 is likewise integrable over [a.b], so Riemann sums for 
its integral are also arbitrarily close to the value of the integral if the partition of the 
x- interval a. b] is fine enough. Therefore, by increasing the size of /, if necessary, 
we can make the inequality 

rb 1 

/ F 2 (x) dx ~YjF 2 (xi) Ax 

Ja i=i 

hold as well. Finally, each F 2 {xi) is itself an integral, 

r d 

Fi (. xi) = J f(xi,y)dy, i= l,...,I, 

and thus has Riemann sum approximations. Therefore, after a sufficiently large I 
has been fixed, we can then increase the size of J, if necesary, so that all Riemann 
sums for each of the integrals F 2 (x i), ..., F 2 (xj) will be arbitrarily close to the value 
of that integral: 





F 2{xi)-Yjf{x u yj)Ay 

7=i 


< 


3 (b — a) 


7=1,...,/. 


Now consider the inequality we seek to prove. As often happens in such a proof, 
we begin with a telescoping sum, 

JJ^f(x,y)dA- J F 2 (x)dx = JJ^f(x,y)dA- XI f(xi,yj)AxAy 

+ X ^X/(W7)4y-^2(*i)j Ax 

^ rb 

+ 'YjF 2 (x i )Ax- / F 2 (x)dx, 

/=! Ja 


and then apply the triangle inequality: 
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JJ^f(x,y)dA- J F 2 {x)dx < JJ^f(x,y)dA - IX f(xi,yj)AxAy 


+ 


+ 


X ( X f( x ^yj)^— F i{ x i) 

1=1 v=i / 


J rb 

'£ j F 2 (x i )Ax- / F 2 (x)dx 
i=l Ja 


Ax 


For / and J large enough, the first term on the right is bounded by e/3, and so is the 
third term; we claim the same is true for the second. We have 


i 

X 


V/=i 


\ 

/ 

J Ax 

<x 

1=1 


j =i 


At 


< %3(b-a) AX 3 (b-a)' 1Ax 


e 

3’ 


as claimed. By what has been said above, this proves the theorem. 


□ 


Corollary 9.2 In the iterated integration of a continuous function with constant Order of integration 
limits of integration, the order of integration can be reversed. □ 


Theorem 9.3. Let S be the region defined by a<x<b, y(x) <y<8 (x), where y(x) 
and 8 (x) are continuous functions ofx on [a, b\. Let f(x,y) be continuous on S; then 


JJ^f(x,y)dA = J f( x ,y)dyj dx. 

Proof. This theorem is similar to Theorem 9.1, and can be proven in essentially the 
same way. We begin by constructing a rectangle 

a<x<b, 

c<y<d 1 

where c < y(x) and 5(x) < d, for all x in [a,b\. Now R contains S, and if we ex¬ 
tend f{x,y) in the usual way by having f[x,y) = 0 outside S, then / is continuous 
everywhere in R, except (in general) on the graphs y = y(x) andy = 5(x). 

By Theorem 8.10 (p. 284), these graphs form a set Z of area zero. Because / 
is continuous on R\Z, it is integrable on R by Theorem 8.35 (p. 305). Moreover, 
because S and R\S are nonoverlapping sets (on each of which / is integrable), we 
have 

JJ f{x,y)dA = JJ f(x,y)dA + JJ f(x,y)dA = jj f(x,y)dA, 

R s r\s s 

because the integral is additive (Theorem 8.27) and/ = 0 on R\S. 
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Interchanging 
the variables 


Now fix x; then f(x,y) is a bounded function ofy on the interval [c,d], and is 
continuous except at the two points where y = y(x) and y= 8(x). Therefore, the 
partial integral of / with respect to y over [c,d\ exists and equals the integral over 
the smaller interval [y(x), 5(x)] (because /= 0 outside that smaller interval). Let 

F 2 (x)= f f(x,y)dy = [ f{x,y)dy 
Jc -'y(x) 

denote the common value; then F 2 (x) is a continuous function of x on [a,b\. To prove 
the theorem it is sufficient to show that 

Jf R A x ,y)dA = / f(x,y)dy^j dx = j F 2 (x)dx. 

Although this equation does not follow directly from the statement of Theorem 9.1 
(because / is not continuous everywhere on R), we can show that it does follow 
from the proof. 

Cover R with a grid of rectangles whose width is Ax = (b — a) /I and whose 
height is A y= {d — c)/J, and define (x;,yy) for i = 1,... ,7, j = 1,as in that 
proof (cf. p. 319). With these choices we can now construct the various Riemann 
sums that appear in the following inequality, taken from the same proof: 


JJ^f(x,y)dA- J F 2 (x)dx < jj^f(x.y)dA - IX f{x i: yj)AxAy 

I ^ L f( x i,yj) A y ~ j Ax 

I rb 

+ ^ F 2 (xi)Ax - / F 2 (x)dx 
/—l Ja 


Now choose e > 0. Then, because / is integrable onR, because /(x/,y) is integrable 
with respect to y for each i = 1and because F 2 is integrable on [a, b\, we can 
choose / and J large enough that each of the terms on the right is less than e/3. 
Because e > 0 is arbitrary, the left-hand side of the inequality must equal zero. By 
what has been said above, this completes the proof. □ 

The theorem holds with the roles of x and y reversed. That is, if f(x,y) is contin¬ 
uous over the region T : c <y < d, a(y) <x < /3 (v), then 


rr r d ( rPM \ 

J ) t f(x-y) dA = J yj a() A x ’ y ) dxj dy. 


This implies the following corollary. 
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Corollary 9.4 Suppose /(x,y) is continuous over a region S that has the alternate 
descriptions 

a<x<b, _ c<y<d 1 
y(x)<y<8(x), ' a(y) <x< p(y); 

then 

fb / rS(x) \ rd / f P(y) \ 

/ / f{x,y)dy)dx= / f(x,y)dx)dy. 

Ja \Jy{x) J Jc \Ja(y) / 

Proof. Both iterated integrals equal the double integral JJ f(x,y)dA. □ 

Here are two common ways to write an iterated integral that dispense with the 
large parentheses: 


[D / ru{X) \ ro rO(x) rD rO(X) 

/ / f{x,y)dy)dx= / f(x,y)dydx= dx f(x,y)dy; 

Ja \Jy{x) J Ja Jy{x) Ja Jy(x) 

rd / rP(y) \ rd rp(v) rd rfi(y) 

/ / f{x,y)dx)dy= / f(x,y)dxdy= / dy I f{x,y)dx. 

Jc \J a(y) J Jc Ja(y) Jc Ja(y) 


Most often, we use the first; the order of dx and dy indicates the order in which the 
partial integrations are to be carried out. 

A good example of the way we can evaluate a double integral with iterated single 
integrals is provided by the gravitational field of a square plate (pp. 270-272) at a 
point above the center of the plate: 


field at a = 


JL 


—adA 


S ( x 2 +y 2 + a 2 ) 3 / 2 ’ 


S: 


0 < x < R, 

0 <y<R. 


(In this expression we have used 4Gp = 1 and we have written the element of area 
as dA.) As an iterated integral, the field is 


R rR 


field at a — 


[I 

Jo Jo 


—ady 


dx. 


(x 2 +y 2 + a 2 ) 3 / 2 

The first integration, with respect to y, can be done with the pullback substitution 
y = fx 2 a 2 tan 0 (see Exercise 9.1); the result i 


is 


/ 


—ady 


-ay 


(x 2 +y 2 + a 2 ) 2 ! 2 (x 2 + a 2 ) (x 2 +y 2 Ha 2 ) 1 / 2 

—aR 

(x 2 + a 2 ) (x 2 + R 2 + a 2 ) '/ 2 


Notation 


The gravitational field 
by iterated integrals 


The antidifferentiation needed for the second integration is more readily done A closed-form formula 
with a table or a computer algebra system: f° r f' e ld strength 
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Comparing estimates 
of field strength 


The circular plate in 
Cartesian coordinates 


field at a = 


.-r 

Jo 


—aRdx 


(. x 2 + a 2 )(x 2 + R 2 + a 2 ) 1 / 2 


= — arctan 


Rx 


(a\/x 2 + R 2 + a 2 


= — arctan 


R 2 


aV2R 2 + a 2 


This gives us a closed-form expression for the field that can shed light on the nu¬ 
merical results we found in Chapter 8. First note (see Exercise 9.2) that 


R 2 


R 2 


R 


isj2 R 2 + a 2 aR\[2\J 1 + (a/2R) 2 ay/2 


+ 0(a/R) asa/R —> 0; 


from this we obtain the approximation 


field at a 


— arctan 



The following table gives the field strength (with R = 32) as determined by a Rie- 
mann sum (the numerical estimate from Chapter 8.1), by the approximation imme¬ 
diately above, and by the complete formula derived from the iterated integrals. 


a 

Numerical 

Estimate 

/ R \ 

— arctan —= 

\aV2j 

( R 2 \ 

- arctan —. „ 

\ay/2R 2 + a 2 J 

0.2 

-1.561957... 

-1.561957722 

-1.561957636 

0.1 

-1.566... 

-1.566376938 

-1.566376927 

0.05 

-1.568... 

-1.568586622 

-1.568586620 


We also noted in Chapter 8 that the computations suggest the field becomes con¬ 
stant as the size of the plate increases, that is, as R —» The formula confirms this: 
because 


lim 

R-*°° 


R 2 

a\/2R 2 + a 2 


lim —— = oo 

R ->°° ay/2. 


we can say 


field of infinite plate at height a = — arctan(°°) = — ^ = —1.570796327, 

a value that is independent of the height a. 

To continue our illustration, let us evaluate the field of a circular plate using 
Cartesian coordinates. Of course polar coordinates lead to a simpler evaluation; 
however, we have already done that (cf. pp. 273-275), and got the result 


field at a = 2nGp 


y/W- 



Cartesian coordinates give us the chance to use iterated integrals to compare calcu¬ 
lations. 
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We can define the region occupied by the circular plate as 

—R <x<R, 


S: 


— VR 2 — x 2 <y< V R 2 — x 2 ; 


therefore, 

field at a = jj 


— GpadA 
(x 2 +y 2 + a 2 ) 3 / 2 


i-R r \f R^—X^ 

Gpa LL 


dy 


-R J-\/r 2 -x 2 (x 2 +y 2 + a 2 ) 3 / 2 


dx. 


We have already (p. 323) computed the inner antiderivative, and found 

V^- 


L 


dy 


r 2 - x 2 (x 2 +_y 2 + a 2 ) 3 / 2 (x 2 + a 2 )(x 2 +_y 2 + a 2 )V 2 


-\Jr 2 -x 


2VR 2 -x 2 


(x 2 + a 2 ) (R 2 + a 2 ) ! / 2 


Consequently (again resorting to tables or a computer algebra system to find the 
antiderivative), 


field at a = 


—2Gpa f R \/R 2 — .r 2 


/: 


sjR 2 + a 2 J-R x 2 + a 2 


dx 


—2 Gpa 

VR 2 + a 2 

Because 

and likewise 


we have 


— arctan 


arctan 


\/ R 2 — x 2 


\/ R 2 + a 2 


- arctan 


x\J R 2 + a 2 


a\J R 2 — x 2 


arctan 


VR 2 -; 


x\/R 2 +a 2 


= arctan(±oo) = ± \ 


x=±R 


a\J R 2 — x 2 


= arctan(±°°) = ±- 


=±R 


-R 


field at a = 


-2 Gpa I n \/R 2 + a 2 n n \/R 2 + a 2 n 


-- + 




VR 2 + a 2 \ 2 a 2 2 


2nGpa ( } VR 2 + a 2 \ _ 
VR 2 + a 2 1 a ) 


2 nGp 


\/R 2 + a 2 


: — 1 


in agreement with the computation using polar coordinates. 
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y 



A function without a 
proper integral 


Theorem 9.3 allows us to reduce a double integral to iterated single integrals; 
however, it holds only for a restricted class of regions. For example, the horseshoe¬ 
shaped region shown in the margin does not meet the restriction (at least for Carte¬ 
sian coordinates). Nevertheless, it is the union of a finite number of nonoverlapping 
sets (e.g., the five whose boundaries are shown by the white lines) that, separately, 
do meet the restriction. The following theorem asserts that this is enough. 

Theorem 9.5. Suppose f{x,y) is continuous on a bounded region R that is the union 
of nonoverlapping sets S\, ..S p , T\ . T q of the form 

a, < x < bi , _ Cj <x < dj , 

Yi(x) <y< Sj(x), 1 ' a.j(y) <x< /3 j(y); 


fb 1 rSi(x) rbp rS p {x) 

/ / f(x,y)dydx-\ -1- / / f(x,y)dydx 

Ja 1 Jyi(x) Ja p Jjp(x) 

fdi fpliy) r d q rPq M 

+ / / f(x,y)dxdy -1-h / / f(x,y)dxdy. 

Jc i Jct\[y) Jc q Ja q (y) 

U • • • U Sp U T\ U • • • U T q is a decomposition into nonoverlap- 

[[ f( x ,y) dA = (( fi x ^y) dA ^—i- [[ f( x ^y) dA 

J Jr JJs\ JJs p 

+ // f{x,y)dA-\ -h [f f(x,y)dA 

J J T\ J J Tq 

by the additivity of the integral (Theorem 8.27, p. 298). The result now follows by 
reducing each double integral on the right to the appropriate iterated integral. □ 


Sr. 

then 

JJ R f( x, y) dA = 

Proof. Because R = 
ping sets, we have 


9.2 Improper integrals 

The integral of a function is defined using values of the function and the sizes 
of small regions, so it is natural to deal only with bounded functions over closed 
bounded regions. However, scientific and mathematical questions just as naturally 
involve unbounded functions and unbounded regions, so it is important to extend 
the process of integration to these more general settings. Such extensions are called 
improper integrals', they are evaluated as limits of “proper” integrals. 

By a proper integral we mean one whose value is determined, in principle, as a 
limit of Riemann sums. For example, 


L 


1 dx 


y/x 
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is not a proper integral in this sense. Of course, in any Riemann sum for f{x) = 
1 /y/x we must avoid the point x = 0 because /(0) is undefined. However, this is 
not the heart of the problem. To see what is, subdivide the interval (0,1] into equal 
subintervals of length Ax = 1 /k, for any positive integer k. Then form a Riemann 
sum E whose first term is 

/(l/ * !, ' lt= 7W r A 

For the remaining terms in E, make any valid choices. Because those terms are all 
positive, we have 

E>Vk —> °° as k — > °o. 

Because these Riemann sums do not converge, there is no integral. The problem is 
not that /(0) is undefined but that f[x) is unbounded on that first interval (0,Ax]. 
(We have already used this example in a slightly different form for a similar purpose 
on page 299.) Theorem 8.36 (p. 307) confirms this; it says that if f(x) were bounded 
on (0,1], it would be integrable on [0,1], 

On any smaller interval [a, 1] C (0,1 ], /(x) is bounded and continuous and there¬ 
fore integrable. Because this integral gives the area under the part of the graph that 
lies above the interval [a, 1] on the x-axis, and because that area increases monoton- 
ically as [a, 1] —> (0,1], it seems reasonable to define 

f 1 dx f l dx 

Jo ^ = vs 

if the values on the right converge to a finite limit. In fact, 



Thus we can say the improper integral “converges” and has the value 2: 


f l dx 
J0 y/x 


= 2 . 


Is this argument unnecesarily elaborate and painstaking? It would appear that we 
could just write 



and get the correct value. However, this computation uses the fundamental theorem 
of calculus, which says that 


y 

V* 3 - 


area = V k 



J_ 1_ ... 1 x 

k 3 k 


The need for 
improper integrals 
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9 Evaluating Double Integrals 


Unbounded functions 
on bounded domains 


when f(x) is continuous on [a,b] and F'{x) = /(x) there. But in our case the fun¬ 
damental theorem fails to apply, because fix) = 1 \/x is not continuous on [0,1]. To 
integrate 1 /fx over (0,1], we must extend the definition of integral in some fashion; 
the ordinary, or “proper,” integral does not exist. 

More generally, to define the improper integral of a function /(x) that is contin¬ 
uous but unbounded on the open interval a <x < b, first take a < a < /3 < b and 
compute the ordinary integral 


K a ,P)= [ f{x)dx 
J a 

as a function of its endpoints a and /3. (The integral exists because / is bounded and 
continuous on [a,j3].) If the values !((/..()) have a finite limit as a —* a and /3 —> b, 
then the improper integral converges and its value is that limit: 



dx. 


More generally, if f{x) is continuous on the closed interval [a,b] except for the 
points c i < C 2 < • • • < Ck at which it becomes unbounded (and a <c\, c* < b), then 
we define the improper integral 


[ f(x)dx= f f{x)dx+ ( f(x)dx-\ -f f f{x)dx 

Ja Ja Jc\ JCfc 

if all the intermediate improper integrals on the right converge. 

Note that the intermediate improper integrals must converge separately and in¬ 
dependently of each other. For example, the integral of 1 /x over [—1,1] is improper 
because 1 jx is unbounded as x —► 0, so we must write 


/' 1 dx 

dx 

f 1 dx 

/ — = 

/-*” 

/ — 

7-1 x 

/-IX 

7o x 


But this fails to converge, because neither improper integral on the right converges: 


(° — = lim C — = limln|jj| =-~, 

J-l x j3—>0J-1 X P^o 


fP dx 


/" — = lim /' 

Jo x a-»0 J a 

If we were to link the two intermediate integrals in the following way. 


r 1 //r 

= lim — In a = +°°. 
la x a-»0 


f dx ( 

f a dx 

dx\ 

/ — = lim ( 

/-^ 

/ — 

7-1 x a->0 V . 

7-1 x 

la X J 


we would reach the false conclusion 
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lim (In a — lna) = 0. 
a -^0 v ' 


There are improper integrals for unbounded domains as well as for unbounded 
functions. If we try to calculate a Riemann sum for a function over an unbounded 
domain, one of the cells must have infinite size. However, there is a natural way to 
define an improper integral. Assuming that / is bounded and integrable on every 
finite subinterval of a < x < °° or — °° <x<b, respectively, we set 

r°° rB pb pb 

/ f(x)dx= lim / f(x)dx , / f(x)dx= lim / f(x)dx. 

Ja B^Ja J ' 


Thus, for example, 



dx 

1 +x 2 



dx 

1 + x 2 


B 


= lim arctanx 

B — 


0 


„ n 
lim arctan B = —. 

2 


We sometimes find a sequence like this abbreviated to 



dx 

1 +x 2 


= arctanx 


o 


= arctan °° = 


n 

2 ’ 


but we must always understand that the briefer calculation depends on the validity 
of the longer one. 

We can now turn from single to double integrals. Suppose R is a closed bounded 
region with area in K 2 and Z a set of area zero. If f{x,y) is bounded and continuous 
on R\Z, then we know / is “properly” integrable over R (Theorem 8.35, p. 305). 
But suppose we allow / to become unbounded on R\Z while remaining continuous 
there; can we define the improper integral of / over R? 

Single integrals suggest that we consider a monotonically increasing sequence 
S\ C S 2 C • • • of closed subsets of R \ Z for which A (St) —> A(R) as k —> °°. On 
each St, f is continuous and bounded, so it has a “proper” integral 


4 = [[ f(x,y)dA. 

JJs k 

If the sequence 4,4,... has a finite limit, 7, we would like to say that the improper 
integral off over R converges and has the value 7. 

However, in any definition that involves choices (as this does with the sequence 
Si , <S 2 >■■■)> we must make certain that the outcome does not depend on the choices 
made. Thus, if 7) C 7) C • • • is another sequence of closed subsets of R \ Z for which 
A(T m )-+A(R), and 

dm = // f(x,y)dA, 

JJTm 

then we must verify that the sequence4• 4• also has a finite limit, J, and then 
that J = I. 


Unbounded domains 


Improper 
double integrals 


Monotonic sequences 
of subregions 
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9 Evaluating Double Integrals 


A comparison with 
infinite series 


Here is an example that illustrates how much variability there can be in the out¬ 
come. Consider f(x,y) = I /x on the unit square R : -1 < x.y < 1. Of course, / is 
undefined on the y-axis Z: x = 0, and is continuous but unbounded on R \Z. 

Let Vk be the infinite strip -\/k < x < I jk that is centered symmetrically on 
the y-axis, and let S* = R \ Vk- The sets Sk are nested monotonically and A (St) = 
4 — 4/k—> 4 =A(R); furthermore, 




dy+f' /' *</>> = 0 
J-lJl/m X 


by symmetry. In fact, 



a: 


dx 


1 jm X 


dy ; 


that is, the contributions to 4 from the left-half plane and the right-half plane exactly 
cancel. 

By contrast, let W m be the asymmetric strip — 1 jm <x < 1/m 2 , and let T m = 
R\ W m . Now A(T m ) = 4 — 2/m — 2/m 2 , so we still have A{T m ) —> A(R) as m —► «>. 
But this time the cancellation is incomplete; the integral over T m reduces to 

r r d A r 1 r\/m 

J m = // —= / / —dy = 2(ln 1/m — lnl/m 2 ) = 21nm. 

JJTm X J — lJl/m 2 X 


(See the exercises.) Because J m —> <», the two sets of integrals do not converge the 
same way, so the improper integral fails to exist. 

We can attribute the variability of the outcomes to the way / changes sign on R. 
If we replace / by |/| then that variability disappears. In fact, the integrals 


4 = [[ \f{x,y)\dA and J m = ff \f(x,y)\dA 
J J S k J J Tm 

are now both unbounded monotonic increasing sequences of numbers. 

Compare what is happening with the integrals to what can happen with certain 
infinite series. For example, 

l_I + I_I + I_I + I_I + I + ... = i n2 - 

1 2^3 4^5 6'1 8 ^ 9 ^ ’ 

that is, the sequence of partial sums 1, 1 — 1 — j + j, ■. .has the limiting value 

In 2. But a rearrangement of the terms can change the sum: 

1 + I_I + I + I_I + I + J__I + ... = 3 ln2 
i + 3 2 + 5 + 7 4^9^11 6 ' ~~ 2 mz ‘ 


(Instead of strictly alternating positive and negative terms, the new series includes 
two positive terms for every negative one; see the exercises.) Choosing the order of 
terms here is analogous to choosing how the subsets Sk and T m expand to fill out 
R\Z. In both cases, different choices lead to different outcomes. Finally, replacing 
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/ by |/| is analogous to making all terms in the series be positive. In this case we 
get the harmonic series: 

1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + '"- 0 °- 


The harmonic series diverges; its sequence of partial sums is monotonically increas¬ 
ing and unbounded. Because the alternating series for In 2 converges but the related 
series of absolute values (the harmonic series) does not, we say the series for In 2 is 
conditionally convergent. By contrast, both the alternating (geometric) series 


1 - 


i i i 


j__ _l_|__ t 

4 8 ' 16 32 ' 64 128 


2 

3 


and the corresponding series of absolute values 


1 + 3 + 3 + S + T6 + ^- 


W 


l 

128 


= 2 


converge, so we say the alternating series for | is absolutely convergent. An ab¬ 
solutely convergent series is more robust: rearranging its terms does not change its 
value. 

The improper integral we define is the analogue of an absolutely convergent se¬ 
ries; its value will not change when we change the way the region R \ Z is “filled 
up” by closed subsets Sk or T m . 


Theorem 9.6. Let Rbe a closed bounded set with area, let Z be a set with area zero, 
and let S\,S 2 t ■ ■ be a monotonic increasing sequence of closed subsets ofR\ Zfor 
which A(Sk ) —> A(R) as k —> «. Suppose f(x,y) is continuous but unbounded on 
R \ Z, but 

4 = JJ s \f(x,y)\dA <B 
for some bound B and for all k. Then the numbers 

4 = U s f{x,y)dA 

have a finite limit I as £ —» °°. Furthermore, the value of I is independent of the way 
the closed subsets Sk are chosen. 


Proof. To show that various limits exist we use the Cauchy convergence criterion : 
the sequence a\,a 2 ,... of real numbers has a finite limit if and only if, for every 
e > 0, there is an N = N(e) such that 


i,j > N ==> \cn — af\ < e. 

The criterion says that the limit exists if all the a, are arbitrarily close to one another 
when i is sufficiently large; for a proof see a text on analysis. 

We first show that the integrals 4 converge for a particular collection of sets Sk- 
Suppose i > /; then 5) D S f so we have 


Conditional 

convergence 


Absolute 

convergence 
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9 Evaluating Double Integrals 


JJ f{x,y)dA = JJ f(x,y)dA + JJ f(x,y)dA, 

Si Sj Si\Sj 


and similarly for \f(x,y)\. Because \f{x,y)\ > 0, the sequence I\,h, ■■ ■ is monotonic 
increasing; by hypothesis, it is bounded above so it has a finite limit. Therefore, by 
the Cauchy convergence criterion, we know that for any e > 0, we can find an N for 
which 

Tt-Tj = \Ti-Tj\<e 

whenever i> j > N. But 

Ii-Tj = JJ I f{x,y) | dA — JJ \f(x,y) | dA = JJ \ f(pc,y) \ dA , 

Si Sj S,\Sj 

so 

JJ \f{x,y)\dA < e. 

Si\Sj 

For any closed set Q in R \ Z, we have 


JJo 


f{x,y)dA 


< 


JJ \ f(x,y)\dA. 


Therefore, when i > j > N we have 




[[ f(x,y) dA- ff f(x,y)dA 

JJSi JJSi 


JJ f( x ,y) dA 

Si\Sj 


< JJ \f(x,y)\dA <£. 

Si\Sj 


By the Cauchy convergence criterion, the sequence I\,h, ■ ■ ■ converges to a finite 
limit. 

Now let T\ C C ■ • • be another sequence of closed sets with A(T m ) —> A(R). 
We claim 

dm = If \f{x,y)\dA < B 

J J T m 

for the same bound B. The foregoing proof would then imply that the sequence 


J m — 


[[ f{x,y)dA 
J J T m 


also has a limit, J. 

To prove the claim, let T be any one of the sets T m . We know f(x,y) is bounded 
on T: | f(x,y) \ < M for some M (that depends on T). Because T \ (T D S^) = T \ Sk, 
we have 

JJ g(x,y)dA — JJ g(x,y)dA = JJ g(x,y)dA, 

T Tns k T\S k 
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where g(x,y) stands for either f(x,y) or \f{x.y)\. Therefore, 



dA 


= JJ \f(x,y)\dA < M■ A(T\S k ) 

T\S k 


by Corollary 8.30. Now A(T\S k ) —> 0 as k —> «>, because T\S k C R\S k and we 
have 

A(T\S k )<A(R\S k )=A(R)-A (S k ) 
by Lemma 8.3, and A(S k ) —> A(R ) by hypothesis. It follows that 

JJ' f(x,y) dA = lim JJ f(x,y) dA and JJ \f(x.y) \ dA = Jim JJ \f(x.y) \ dA. 
t ms k t ms k 

Using the second equation and T ft S k C S k , we find 

JJ I f(x,y)\dA = lim JJ \f(x,y)\dA < lim JJ \f(x,y)\dA < B , 

t Tns k s k 

proving the claim and showing that the limit J exists. 

To prove that I = J, we begin by noting that 


JJ f(x,y)dA- JJ f(x,y)dA 

= lim 

i — >°° 

JJ f(x,y) dA - JJ f(x,y) dA 

T TnSj 


TnSi rnSj 


= lim 

JJ f(x,y)dA 

Tn(Si\Sj) 


*«=// \f{x,y)\dA < e, 
Si\Sj 


a result that holds for all j > N, where N was the number provided by the Cauchy 
convergence criterion for the sequence I\,L 2 ,.... (In particular, this N is independent 
of the choice of T.) The initial equality uses 

JJ f(x,y)dA = lim JJ f(x,y)dA. 

T TnSi 
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Improper integral of 
an unbounded function 


Furthermore,because the sequence ■ ■ ■ also converges, there is a number/, 
for which 

| Jl dm | ^ £ 

for all / > m> L. Reversing the roles of Sk and T m we can therefore conclude that 


JJ f(x,y) dA - JJ f(x,y) dA 

s snr m 


< £ 


for every m > L and any closed subset S of R \ Z. 
Finally, the telescoping sum 


I ~J = I~ JJ f(x,y)dA + JJ f(x,y)dA - JJ f(x,y)dA 
s k s k s k nr m 

+ II f( x,y ^ dA - fj f( x 'y) dA +II f( x ^y) ciA ~ J 

S k n T m T m T m 

leads to the triangle inequality 


\i-a< 


II f( x ^ dA 

Sk 

JJ f{x,y)dA 


+ 

(( f(x,y)dA- 

JJ f{x,y)dA 


s k 


k nT m 

-L 

f f(x,y)dA 

+ 

JJ f(x,y) dA — 


Tm 


T„, 


If we choose k and m sufficiently large, each of the four terms on the right will be 
bounded by e, so \I — J\ < 4e. Because e > 0 is arbitrary, I = J. □ 

Definition 9.1 Suppose R is a closed bounded set with area, Z a set with area zero, 
and S k is a monotonic increasing collection of closed subsets of R\Z for which 
A(S k ) —>A(R) as k—> Suppose f(x,y) is continuous but unbounded on R\Z, and 
the integrals 

[[ I f(x,y)\dA 
JJs k 

are uniformly bounded in k. Then the improper integral of f over R is 

[[ f{x,y)dA = lim [( f(x,y)dA. 

JJr k—>°°JJs k 

When the improper integral exists, we often say that it converges. 

How “unbounded” can a function be and still have a convergent improper inte¬ 
gral? For example, 

f{x,y) = ~z, r= y / x 2 +y 2 
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is unbounded near the origin when p > 0; for which values of p does 


//" 

x 2 +y 2 <l 


converge? The figure suggests 1 /r p becomes unbounded more rapidly as p becomes 
larger; thus, the integral—thought of as the volume under the graph—is more likely 
to converge for smaller values of p. To answer the question, let Sk be the set of points 
(x,y) in the ring 1/k 2 < x 2 +y 2 < 1. Then, changing to polar coordinates, we find 




r 2-p 
2-p 


l 

l/k 


2 n 

2-p 


(1 +F“ 2 ). 


This formula does not allow p = 2, so we deal with that case separately. 

First, the sequence I\ ,4, ■ • • has a finite limit as k —> °° only if k p ~ 2 —> 0, that is, 
only if p — 2 < 0, or p < 2. If p = 2, then 



f 2n y 1 dr 

Jo Jl/k r 


dO = 27tlnr 


= 2n Ink —> «>. 

i/k 


Thus the improper integral converges if and only if 0 < p < 2, in which case it has 
the value 

rr dA _ 2 n 

JJ r p 2 — p 

x 2 +y 2 < 1 

(Of course, this formula also gives the value of the “proper” integral that exists for 
all p < 0.) Our example has the following useful generalization. 

Because \f{x,y)\ = \/r p = f(x,y), the improper integral of / converges “ab¬ 
solutely” or not at all; we do not need to test separately whether the integrals of 
\f{x,y)\ are uniformly bounded, as stipulated in Theorem 9.6. 

For another example, take g{x,y) = In r. It is also unbounded near the origin, but 
because lnr < 0 when 0 < r < 1, we should first consider |g(x,y)| = — lnr. Thus, 
on the disks S* : l/k 2 < x 2 +y 2 < 1 that we just used. 


4 


- JJ 

1 /^ 2 < x 2 +>^<1 


\g(x,y)\dA = 



—rlnrdr d6. 


The functionz = — r lnr is continuous and bounded by 1/e on 0 < r < 1, so we have 

/' -r\nrdr<-(\--\ < 

Jl/k e \ kJ e 

implying the uniform bound 4 <2 n/e for all k. Therefore, by Theorem 9.6, the 
improper integral 


graph of z = y p 



Testing for 
absolute convergence 
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Unbounded regions 


JJ In \Jx 2 +y 2 dA 

x 2 +y 2 < 1 

converges. (It converges to — 7 t/ 2; see Exercise 9.22). 

An integral will also be improper when its domain R is unbounded. As in the 
earlier case of an unbounded function, choose a monotonic increasing sequence of 
closed bounded subsets S\ C S 2 Q • ■ ■ of R. Because it no longer makes sense to 
require A (St) —> A(R) (because the area of R may be infinite), we achieve what we 
really want—that the sets Sk eventually cover R —by stipulating instead that each 
closed bounded subset of R be contained in some S*. As in the earlier case, we also 
require absolute convergence. 

Theorem 9.7. Let R be an unbounded set in R 2 , Z a set with area zero, and S\, S 2 , ■ ■ ■ 
a monotonic increasing sequence of closed bounded subsets of R such that, given 
any closed bounded subset W of R, Sk D W for some k. Suppose f(x,y ) is bounded 
and continuous on R\Z, and 

4 = JJ s \f{x,y)\dA <B 
for some bound B and for all k. Then the numbers 

4 = JJ f(x, y ) dA 

have a finite limit I as k —*• Furthermore, the value of I is independent of the way 

the closed subsets Sk are chosen. 

Proof. This proof has many parallels with the previous one; we focus on the points 
where the two differ. To begin, because / and |/| are bounded and continuous on 
R\Z, the same is true on each Sk\Z, so / and |/| are integrable on each Sk. This 
is enough to establish, as in the earlier proof, that the sequence I\ , h , ■ ■ ■ has a finite 
limit. 

The next step is to consider a second monotonic increasing sequence of closed 
bounded subsets T\ , 4,... that exhaust R the same way the sequence S), S 2 ,... does. 
Each T m is a closed bounded subset of R; thus it is entirely contained in some Sk, by 
hypothesis. Hence, 

Jm = [[ \f(x,y)\dA < jj \f(x,y)\dA < B. 

J J T m J J S/c 

As noted in the earlier proof, this implies that the integrals 

Jm= ff f{x,y)dA 
J J T m 


converge to a finite limit, J. 

To prove that I = J, the earlier proof first established that 
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JJ f{x,y)dA- JJ f(x,y)dA 

Tm TmHSk 


< £ 


for all k > N, and uniformly for all T m . There, the key step was that 

JJ f(x,y)dA = lim JJ f(x,y)dA; 

T m T m nSi 

here, this holds for the simple reason that T m n Si = T m for all i sufficiently large. 
Reversing the roles of S k and T m we likewise conclude that 


JJ f{x,y)dA- JJ f{x,y) dA 

Sk s k nT m 


< £ 


for every m > L and for all S k . Then \I — J\ < 4e as before. 


□ 


Definition 9.2 When the conditions of the previous theorem are met, then the im¬ 
proper integral of f over R converges to 

[[ f(x,y)dA = lim [[ f(x,y)dA. 

In Chapter 1, we met one of the standard examples of an improper integral over 
an unbounded interval: 

[[ e^- yl) l 2 dA = 2n. 

JJs 2 

We used this to evaluate another improper integral, 

J°° e -(*-M) 2 / 2 <T 2 dx = (7sFbt, 

that relates to the density function of the normal probability distribution of statistics. 


9.3 The change of variables formula 

This book begins with the change of variables formula for single integrals. It says 
that when there is an invertible pullback function x = (p(s), then 

rb r<p~ l (b) 

/ f{x)dx= f(<p(s))<p (s)ds. 

Ja J(p l (a) 

Our goal here is to state and prove the analogous formula for double integrals. As 
we show in a moment, the single integral takes orientation into account; however, 


Improper integral over 
an unbounded domain 


The normal density 
function of statistics 


The formula for 
single integrals 
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y 

0 . 4 ' 

0 . 2 - 


ln.v 



1 2 X 


we have not yet defined double integrals with orientation. At this stage, therefore, 
we must suppress the information about orientation in the single-integral formula to 
carry through an analogy. Here is an example that illustrates both the problem and 
the solution. 

Let f{x) = lnx/x 2 ; we can see by eye that the value of the integral 


l 




is positive but less than 0.2. (The vertical scales of the graphs shown in the mar¬ 
gin have been doubled for clarity.) To find the value using the change of variables 
formula, consider the pullback x = tp{s) = l/s. Then (p'{s ) = — 1/s 2 , and 


l 


2 \nx 
—r- ax = 
x z 





L 


1/2 

Ins ds. 


Note that the new integrand, Ins, is negative on the new interval [1 /2,1] More sig¬ 
nificantly, the new integration is carried out in the negative sense, from 1 backwards 
to 1/2. These two “negative” aspects of the new integral combine to produce a posi¬ 
tive value, 


,1/2 

I, 1 


lnst/s = sins —s 


1/2 


= ilni — i — (lnl — l) = i(l — ln2) « 0.15. 


The effect of 
reversing orientation 


An orientation-free 
change of variables 
formula 


What we see in this example always happens when (p is orientation-reversing : 
<p'(s) < 0 changes the sign of the integrand, and the oriented interval a —> b is 
transformed into the oriented interval <p _1 (a) *— <p -1 (fi); that is, a < b implies 
<P _1 (a) > <P _1 (*)- 

We therefore need to reformulate the way we write a single integral so as to 
suppress this information about orientation. The unoriented version of the change 
of variables formula has the following form, 


Jf(x)dx= j /((p(s))|<p , (s)|r/s. 
1 (p-fl ) 


In this formula, I stands for the unoriented set of real numbers x that lie between a 
and b, inclusive, and rp -1 (/) is the unoriented set of real numbers s for which <p(s) 
lies in /. The integral over the unoriented domain I is defined in terms of the usual 
integral, as follows. 


J f(x)dx 


f f(x)dx if I=[a,b\, 

J a 

[ f(x)dx if 1= [b,a ]. 
Jb 
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Before we prove that the new, unoriented, change of variables formula is the 
correct modification of the original one, let us verify that it works on the example 
f(x) = In x/x 2 withx = (ps = 1 /s. Because <p( [1,2]) = [1/2,1] and rp' (s) = +1 /s 2 , 


/ 


\nx 

—ax = 
x l 


i 


[ 1 / 2 , 1 ] 




— Ins ds = s — sins 


l 

1/2 


1(1-ln2). 


To prove that the oriented change of variables formula leads to the unoriented 
one, let us assume a<b; we can use a similar argument if b < a. If (p is orientation¬ 
preserving, then |<p'(s)| = <p'(s) and (p~ 1 (a ) < <p -1 (fi), so 


J f{<P(s))\(p > (s)\ds = J ^ f{(p(s))(p'(s)ds = j f(x)dx = J f(x)dx. 


If ( p is orientation-reversing, then |<p'(s)| = —(p r (s) and (p 1 (b) < <p '(a), so 
/ f(fp(s))W(s)\ds=f f((p{s))(-(p'{s)ds) 

<p ‘(/) 

r<p~ l (b) rb r 

= / , f(<p(s))<p (s)ds= / f(x)dx = / f(x)dx. 

J(p l (a) Ja J 

The new formula holds in both cases. 

We can now formulate an analogous change of variables formula for double in¬ 
tegrals. Let f(x,y) be a continuous function on a domain D in K 2 , and assume that 
D is a closed bounded set that has area. Then the integral 

JJ D f(x,y)dxdy 

exists (Theorem 8.35, p. 305). We use “ dxdy ” here in place of “dA” as a way to keep 
track of the variables that appear in different integrals. If the change of variables is 
given by the pullback substitution 


Change of variables 
for double integrals 


(P ■ 


x = x(s,t), 

y=y(s,t), 


then we show that the integral is transformed by 

JJ f(x,y)dxdy = JJ /(<PCM)) |detdq>(^)| dsdt. 

D 

In particular, | detd<p ( v | corresponds to | <p / (^) |; this is the absolute value of the 
Jacobian, 
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J<p{s,t) = detd<p ( ^) 


Consequently, 


dx dx 
Ts {s ’ ,} a7 (s ''> 

dy, A dy 

ih M * (SJ) 


d(*,y) 

d(s,t) ' 


JJ A<p(s,t)) |detd<p w) | dsdt= JJ f(x{s,t),y(s,t )) 

r‘P) < p ^( d ) 


d{x,y) 

d{s,t) 


dsdt 


gives us an alternate expression for the transformed integral that is useful in the 
following work. Moreover, the notation suggests that, when the variables themselves 
change, then 


dxdy changes to 


d{x,y) 

d(s,t) 


dsdt. 


For the moment, this is just a mnemonic. Before proving the theorem that estab- 
lishs the change of variables formula for double integrals, let us first explore some 
examples. 

Most familiar is the change to polar coordinates that we have already used several 
times: 


Jx = rcos0, 

d{x,y) 

COS0 

—rsind 

1 y = rsin0; 

d(r.9) 

sin0 

7-COS0 


JJ f(x,y)dxdy = JJ f(rcos9,rsin0)rdrd9. 

D Ip~ l (D) 


Changing variables 
to simplify domains 


With single integrals, we change variables to simplify the integrand. With dou¬ 
ble integrals, there is a second reason: to simplify the domain. One example is an 
integral with circular symmetry; it is often recast into polar coordinates. A second 
example is the integral 


IL 


A +y 2 dxdy , 


where the domain D is the curvilinear quadrilateral in the first quadrant whose points 
(x,y) satisfy the inequalities 


1 <x 2 —y 2 < 2, 
1 < 2 xy < 3. 


The sides of D are hyperbolic arcs. The quadratic map 


2 2 
u = x —y, 

V = 2 xy, 














9.3 The change of variables formula 


341 


that we analyzed on pages 116-121 straightens these arcs. For example, the hyper¬ 
bola x 2 — y 2 = 1 in the first quadrant of the (x,y) -plane becomes the line u = 1 in 
the first quadrant of the (u, v)-plane. The quadrilateral D becomes the rectangle 


R = g(Z>): 


1 < u < 2, 
1 < v< 3. 


Unfortunately, g is a push-forward map, not a pullback, so the change of variables 
formula does not apply directly. But g is invertible on the first quadrant, and g 1 
does indeed pull back (x,y) to (n, v), so we let g _1 play the role of <p and write 


y 



JJ f(x,y)dxdy = JJ f (g \u,v)) 


detd g (i ,!v) 


dudv. 


D 


g(-D) 


To evaluate the right-hand side we can use formulas for the components of g 1 (w, v) 
that appear in Exercise 4.13 (p. 144). However, we can actually determine every¬ 
thing we need without recourse to those formulas. To begin, 

u 2 = x 4 — 2x 2 y 2 +y 4 and v 2 = 4 x 2 y 2 , 
so u 2 + v 2 = x 4 + 2x 2 y 2 +y 4 = (x 2 +y 2 ) 2 and 



g (D) 
= R 




l 2 u 


f(x,y) = x 2 +/ = y/u 2 + v 2 = /(g *(m,v)). 


Next, 


detdg 1 = —— 


1 


1 


> 0 , 


detdg 4 (x 2 +y 2 ) 4 fu 2 +v 2 

so the integrand of the transformed integral is just 

/(g _1 (w,v)) detdg^ = Vu 2 + v 2 


4\/ u 2 + v 2 


Therefore, the change of variables formula gives 


JJ x 2 +y 2 dxdy = JJ — dudv = — 

D R 


= - area R = -. 

2 


(Of course we could have tried to evaluate the original double integral by reducing 
it to iterated integrals, but they lack the simplicity and elegance of the transformed 
double integral.) 

We can now formulate the change of variables formula for double integrals under 
a general push-forward substitution g that has an inverse: 


Change of variables 
with a push-forward 


g: 


x = x(u,v), 
y=y(u,v). 


u = u(x,y), 
v = v(x,y); 
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If we write the Jacobian as 

J i (k,v) = detdg -1 = d S?' y \ 
g d(u,v) 

then the change of variables formula takes the form 


JJ f( x ,y) dx dy = JJ f(x(u,y),y(u,v)) 
o m 


d{*,y) 

d(u,v) 


du dv. 


Areas via the change 
of variables formula 


Note that we have expressed the transformed integral in terms of the component 
functions x(u,v) and y{u,v) of g~'. But usually only the components of g itself are 
given; it may be difficult or impossible to find closed-form expressions for x(u,v) 
and y(u,v). This can make it impractical to transform an integral by a push-forward 
substitution. 

The change of variables formula for double integrals also gives us a way to de¬ 
termine areas. To continue the last example, we have 


area D = 


!h ldxiy= !hwlh? 


du dv. 


One way to continue is to convert the double integral into an iterated integral: 


JL 


R 4 


dudv = 


\j:j: 


du 


Vr- 


dv. 


This can be evaluated using a computer algebra system (or a table of integrals). We 
can also use the pullback substitution u = v- sinh(.s) (see Exercise 1.16, p. 23) to get 


i; 


du w 

= = arcsinh - 

.2 y 


vV + v 


2 = arcsinh--arcsinh-. 
l v v 


Integration by parts (when v > 0) gives 


/ 


1 • arcsinh - dv = v • arcsinh- 

v 


Bin 1 , 2 -/ 


—a 


v/v 2 + a- 


■.dv 


= v ■ arcsinh —h a- arcsinh -, 
v a 


so we get 



arcsinh--arcsinh- 
v v 


dv 


2 v 1 

v • arcsinh —b 2 • arcsinh-v • arcsinh-arcsinh v 

v 2 v 


0.820853. 


Finally, incorporating the factor 1/4, we find area D xs 0.205213. 
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We can also get an approximation to area D by approximating the value of the 
double integral by a Riemann sum. The integrand J(u,v) = 1 / 4 fu 2 + v 2 is, of 
course, the local area magnification factor for the map g 1 : R —> D. Therefore, 
we can estimate the area of D as follows. Divide R into small rectangular cells Q of 
area AuAv; multiply that area by the value of J at the center of C to approximate the 
area of the image gin D; add the results. The table below does this with R par¬ 
titioned into eight squares of area 1/4. The accumulated sum of JAuAv is tallied in 
the right column; it yields the estimate 0.204 806 for the area of D (cf. Exercise 8.2, 
p. 313). 


u 

V 

J(u,v) 

Sum 

1.25 

1.25 

0.141421 

0.035355 

1.25 

1.75 

0.116248 

0.064417 

1.25 

2.25 

0.097129 

0.086994 

1.25 

2.75 

0.082761 

0.109390 

1.75 

1.25 

0.116248 

0.138451 

1.75 

1.75 

0.101102 

0.163 705 

1.75 

2.25 

0.087706 

0.185632 

1.75 

2.75 

0.076696 

0.204806 


v 



y 




For a linear map L, | detT| is the magnification factor for areas (by Theorem 8.22, 
p. 292): 

A(L(S)) = IdetZ,^^), 

when S is any subset of the plane that has area. (Flere area is nonnegative; in the 
following section we consider oriented regions that are assigned negative area.) For 
a nonlinear map (p{s,t ), the connection between A {(p{S)) and^(5) is not so simple, 
but the change of variables formula still allows us to write 


Area magnification 
for nonlinear maps 


A(<p(S)) = JJ \detd<p( S ^\dsdt = JJ \J v (s,t)\dsdt. 

Using the language of set functions (see below, p. 352), we show how this equation 
makes \J,p(s,t)\ the local area magnification factor for «p. Flere, it is the crucial “base 
case” of the change of variables formula for double integrals. We state it now as a 
theorem. 

Theorem 9.8. Let LI be a bounded open set in R 2 , and let (p : LI — > R 2 be a con- Jacobian as local area 
tinuously differentiable map that has a continuously differentiable inverse (p 1 : magnification factor 

cp(Ll) —* LI. Suppose the set S has area and its closure S = S U dS lies within LI; 
then (p(S) has area and 


A(<p(S)) = JJ \J v (s,t)\dsdt, 

where J(p(s,t) is the Jacobian of (p at ( s,t ). 

Proof. The proof is simple in principle; it follows an argument given by J. Schwartz 
[16], First, partition S into small pieces Q;. On Q t , choose a representative value for 
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the area magnification factor |y p | = | del dtp j. Then the area of the image <p ( Q,) is 
approximately \J v \A{Qi), and the sum of such terms approximates the area of <p(S). 

However, the details of the proof are numerous and lengthy; they involve several 
steps that we write as separate lemmas. We first need an open set U containing S on 
whose closure the functions <p, dtp, and Jy are uniformly continuous. 

Lemma 9.1. There is an open set U for which S C U C U C Q. 

Proof. Let p be a point in R 2 ; define the function 

</(p) =min||p-s|| 
seS 

that gives the distance from p to the closed set S. Then t/(p) = 0 if and only if p is 
in S; in particular, d{QF) > 0 because S C £2. But QF is closed and d is continuous, 
so (p) attains its minimum m > 0 at some point po in QF: d(Q c ) > t/(po) = m > 0. 
To complete the proof of the lemma, we can take 


U = {p : t/(p) < m/2}, U = {p : </(p) < m/2}. □ 


The next lemma makes the first step in connecting A((p(Q )) to the integral of 
\Jq, | over Q, where Q is a square in one of the original grids used to define Jordan 
content. It says that the outer area of the image (p(Q ) is bounded by the maximum 
value of | J(f | on Q. 

Lemma 9.2. For any given £ > 0, there is a positive integer K such that if Q CL is 
a square of J * and k > K, then 


A(<p(Q))<(M+0(e))A(Q), 


where M is the maximum value of\J^ \ on Q. 



Proof. The idea of the proof is to compare the action of <p to the action of its linear 
approximation d(p a taken at the lower-left hand corner a of Q. In terms of local (or 
“window”) coordinates 


As = s —a and Ax = x — q»(a) 


centered at this comer and its image, the two maps are 
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Ax = Aq> a (As) = <p(a + As) — <p(a) and Ax = d<p a (As). 


Because d <p a is linear, the image d <p a (Q) is a parallelogram, and 

A(d<p a (Q)) = \detd(p a \A(Q) = \J v (a)\A(Q) <MxA(Q). 

We now use the continuity of <p and d (p a to show that, when Q is sufficiently small, 
the two images d (p a (Q) and A(p a (Q) are close enough so that our bound on the area 
of the first leads to a (slightly larger) bound on the outer area of the second. 

For any point As, we want to determine the difference 


e = A«p a (As) — d<p a (As). 


It is convenient to work with the component functions of A<p a and d(p a : 



Applying the law of the mean (Theorem 3.5, p. 75) to each component of A <p a , we 
get 



Ax = x s {a\,b{) As +x t (a\,b\) At, 
Ay = y s [a 2 , b 2 ) As + y t (a 2 , b 2 ) At , 


where {a\,b\) and (a 2 ,b 2 ) are two properly chosen points on the line connecting 
(a,b) and (a + As,b + At). This allows us to write e = A«p a (As) — d<p a (As) as 

{ Ax = (x. s (a\,b\) — x s (a,b)) As+ (x t (ai,b\) —x t (a,b)) At, 

A y= (y s (a 2 ,b 2 ) -y s (a,b))As+ ( y t (a 2 ,b 2 ) -y t (a,b))At. 

To get a bound on e, we use the continuity of d<p s as a function of the point s. On 
the closed bounded set JJ (Lemma 9.1), the four components x s ,x t ,y s , y, of d<p s are 
uniformly continuous. Thus, for e > 0 as given in the statement of the lemma, there 
is a 8 > 0 such that 


||(5l -S 2 ,tl -t 2 )|| < 8 => ||Xi(jl,fl) ~X s (s 2 ,t 2 )\\ <£, 


and likewise forxi,^, and_w. Now choose K so that the mesh size ||^a:II = V2/2 K 
is less than 8 (Definition 8.14, p. 291). Then, for any k>K, we have 



Now let Q C U be a square of J k , k > K\ then 


||(«i — a,b\ — h)|| < 8 and \\{a 2 —a,b 2 —b)\\ < 8, 
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and 0 < As. At < 1 /2 k . Therefore, 

{ 2 e 

|Ar| < \x s (a\,b\) — x s (a,b) | Aj+ |x,(ai , hi) — x ; (a,b)| A t < —r, 

|Ay| < |^(«2,^2)-y s (a,^)|A5+|y r (a2,^2)-T/(«,*)|Af< —^ 
so ||e|[ < 2ev / 2/2 i = PF. 

Thus, every point of A(p a (Q) = <p(Q) lies either in the parallelogram d<p a (2) 
or within the distance W from one of the four lines that make up its boundary. So 
< p(Q ) is contained in the union of the parallelogram and four rectangles of length 
L + 2W and width 2 W, where L is the length of the longer side of the parallelogram. 
A bound on the length L will thus lead to a bound on the outer area of < p(Q ). 



To get a bound on L it is enough to consider the two sides of d(p a (Q) that meet at 
the comer <p(a, b). These are the vectors 


d<Pa 



J_ /x i (a,b)\ 
2 k \ x,{a : b)) 


and dtp a 



_L (ys(a,b)\ 

2 k \yt{.a,b)) ■ 


Because d<p s is a continuous function of s and Q lies in the closed bounded set U, 
the four component functions of d (p s are uniformly bounded on U: for some B > 0, 

Ws,?)| <s, ^(5,01 <s, |y,(5,t)|<S, 

for all (s,t) in U. This implies L < B\J 2/2*; thus, for each of the four rectangles, 


area = 2W (L + 2W) < 


2 k l 2 k 

Because e(B + 4e) = 0(e) as e —> 0, we have 


4eV2 (BsJ2 4eV2\ _ 8e(fi + 4e) 


2 k I 


2 2k 


A(9(Q)) < ^(d < Pa(g))+4 8£( ^ + 4£) < MA{Q) + 0(e) A(Q). 


□ 
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Lemma 9.3. IfS has area and S CQ, then 

A(q>(S)) < JJ\J v {s,t)\dsdt. 

Proof. To prove this lemma, we construct upper Darboux sums for |J,p(s,t)| on S 
(cf. Definition 8.17, p. 300). 

By the proof of Lemma 9.1, S is contained in the interior of a bounded closed set 
U C £2 whose points lie within distance mil of S. Fix e > 0 and choose K to satisfy 
both the previous lemma and the condition that the mesh size \\J K \\ is less than m/2. 
Then, if k > K, any square Q of J k that meets S will lie within U. Let Q \,..., Qj be 
the squares of J k that meet S. Set 

M t = max \J,p(s,t)\, 2=1,...,/; 

MeQi 

then, by the previous lemma, 

A(<P(Qi)) < (M + 0(E))A(Qi), 2=1,...,/. 

Because SC Q\Li---UQi and thus <p(S) C (p(Qi) U • • • Utp(Qj), we have 

A(<p(S)) < il(<p(&)) < Y J (M i + 0(E))A(Q l ) 
i= 1 2=1 

= J j M i A(Q i ) + J j O(E)A(Q i ). 

2=1 2=1 

The first term is the upper Darboux sum for \Jq,(s,t)\ over S and the grid J7 k \ the 
second is 0(e) times the outer area of S over the same grid; that is, 

A(<p(S)) < Dj k (\Jq,(s,t)\,S) + 0(s)A k (S). 

Because |y v (j,f)| is integrable over S, the upper Darboux sums converge to the 
integral (and the outer areas to the area) as k —> °°; thus 

A(<p(S))< jJ\J v (s,t)\dsdt + 0(e)A(S). 

This inequality holds for any e > 0; therefore it continues to hold as £ —> 0 (and 
hence as 0(e) —> 0), so 


A(<p(S)) < jj \J v (s,t)\dsdt. □ 

Part of the assertion of the main theorem is that <p(S) has area (implying we are 
able to replace A (q> (S)) by A (q> (S)) in the lemma just proven). The next two lemmas 
establish that <p(S) does indeed have area by showing that its boundary d(<p(S)) has 
outer area equal to zero. 
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Lemma 9.4. d(<p(S)) C <p(dS). 

Proof. Letx = «p(s) be a boundary point of (p (S) ; we must show that s is a boundary 
point of S. We use the criterion established in Exercise 8.5 (p. 313): every open disk 
centered at a boundary point of a set T contains at least one point in T and one point 
not in T . 

Let D\ c £2 be an open disk centered at s. By a corollary to the inverse function 
theorem (Corollary 5.3, p. 174 ), x is an interior point of <p(£>i), so there is an open 
disk £>2 centered at x for which £>2 C <p(D\). But x is boundary point of <p(S), so 
£) 2 contains a point P 2 in (p (S) and another point q 2 that is not in <p{S). But then 
the point pi = (p 1 (P 2 ) in £>1 is in S and the point qi = <p~ l ( q 2 ) in £>1 is not in S. 
(Draw a picture.) □ 

Lemma 9.5. A(d(q>(S))) = 0. 

Proof. Apply Lemma 9.3 to the zero-area set dS. If |./ ( p(.s,t)| < B on dS; then 


A(d{tp{S>)))<A(<p{dS))< JJ \J v (s,t)\dsdt<BA(dS) = 0. 


□ 


Corollary 9.9 A((p(S)) < JJ \J v {s,t)\dsdt. 
).6. A(q>(S)) > jj\J v (sf)\dsdt. 


Lemma 9.i 


□ 


Proof. The idea of the proof is to apply the previous arguments to the inverse 
map (p 1 and a set £ = <p(R), where R has area and RUdR C D. By Corollary 9.9, 
T has area. Furthermore, 


T = TUdT C(p(RUdR) C<p(Q), 


so we can write 


A(R)=A(<p-\T))< JJ \jy-i (x,y) | dxdy. 

<p(R) 

Now let £ be a square Q of the grid J k , and let ji be the minimum value of j J v (s, t) j 
on Q. Note that n > 0, because d«p s is invertible and a uniformly continuous function 
of s on Q. Using Corollary 4.13, page 138, we find 

, _1_ J_ 

V ' \J<p{s{x,y),t{x,y))\ ~ n 

for all (x,y) in <p(Q). Therefore, 

AQ)< JJ (x,y)\dxdy < 

<P(Q) 


or A(<p(Q))> U A(Q). 
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Let P \,... ,Pj be the squares of the grid that are entirely contained in 5, and 
let 

llj= min \J v (s,t)\, 

(s,t)ePj 

Because 5 3 P x u • • • UPj and <p(S) 3 <p(P\) U • • • U(p(Pj), we have 

A(<p(S)) > £a(<p(Pj)) > £fljA(Pj)=D Ji (\J 9 \,S), 
j= i j= i 


the lower Darboux sum for over 5 and the grid In the limit as k —> «>, the 
Darboux sum becomes the integral: 

A(<p(S))> jj\J v (, s,t)\dsdt . □ 

This completes the proof of Theorem 9.8. □ 

We have already constructed new integration grids from old ones using invertible 
linear maps (cf. pp. 287-293): if the grid (j has cells Q\ and L : R 2 —> K 2 is invert¬ 
ible, then the images L(Qj) form the cells of a new grid 9{ = L((j). Furthermore, 
L determines a constant a that relates the mesh sizes of (j and 9~(\ \\^C\\ = o || (j\\. 
Theorem 9.8 creates the possibility of creating new grids using invertible nonlinear 
maps. 

Let (p : Q —> (p(il) be continuously differentiable with a continuously differen¬ 
tiable inverse on <p(f2). Let (j be a grid whose cells Q, lie within Q.. By definition, 
the Qi are closed nonoverlapping sets with area. By Theorem 9.8, the sets P, =<p(Qi ) 
are likewise closed nonoverlapping sets with area. We define them to be the cells of 
the grid 9{ = 

For a nonlinear map (p , there is no general analogue to the scale factor a. FIow- 
ever, suppose 5 has area and is a closed subset of Q.. Then the sets 

Qi = Qi n 5 and P, = P,n<p(S) = <p(Q, ) 


are closed nonoverlapping sets with area, so they constitute the cells of grids that 
we denote Cj s and ^i<p(s ), respectively. For these special grids, there is a natural link 
between their mesh sizes. To find it, note that the continuous map (p is uniformly 
continuous on 5. Therefore, given any e > 0 there is a 8 > 0 such that 


||si-s 2 ||<c5 => ||«p(si)-q>(s 2 )|| < e. 


This implies 

||£ 5 ||<c> =► ||^ ( 5)|| <e; 

in other words, we can make ||^/" ? ( S )|| as small as we wish by making || ^ s |j suffi¬ 
ciently small. 

When calculating Riemann sums for a continuous function / defined only on 
a set 5, we first define / to be zero outside 5 in order to allow for the possibility 


New grids from 
nonlinear maps 


Restricting a grid 
to a closed set 
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Change of variables 
in double integrals 


that / is evaluated on the part of a cell that lies outside S. Because / on this larger 
domain is usually not continuous, a delicate argument is needed to show that the 
Riemann sums converge. The grids Cj s eliminate this problem, because their cells 
lie entirely in S. The following theorem shows that Riemann sums constructed with 
these restricted grids still converge to the integral. 

Theorem 9.10. Suppose f is continuous on a closed bounded set S that has area. 
Then Riemann sums constructed with grids Cj s converge to the integral of f as 

II^IHo. 

Proof. Let e > 0 be given; we must find a 8 > 0 so that 


JJ s f( x ,y)dA - X f(xi,y t )A(Qi) 


< £ 


for any grid Q s with || (f s \\ < 8, and for any point (x;.y,j in the cell Qi of Q s , for 
each i = 1,... ,7. 

Because / is uniformly continuous on S, we can choose 5 > 0 so that 
||(x,y) - (x',/)|| <8 => \f(x,y) -/(x',/)| < -^y. 

Now let Q s be any grid for which || Cj s || < 8. Then, because 


[[ f{x,y)dA = Y [[ f(x,y)dA and f{xi,yi)A{Qi) = [[ f(xi, yi )dA , 
JJs JJQi JJQi 


we have 


[[ f(x,y ) dA-Y, f(xi,yi)A{Qi) < Y IL dA~Y (L f( x i,yi) dA 
JJs ~i PiJJQi JJQ, 

Y jL \f(x,y)-f{xi,yi)\dA < Y [L dA 
pfl JJQi d-(S) J J(Qi 


= £. □ 


We can use this result immediately, to prove the main formula on the change of 
variables in double integrals (by continuing to follow the argument of J. Schwartz 
in [16]). 

Theorem 9.11 (Change of variables). Let Cl be a bounded open set in R 2 , and let 
(p : Q. ]S? : (s f) (x,y) be a continuously differentiable map that has a con¬ 
tinuously differentiable inverse (p 1 : (p(Q.) — > Q. Suppose the function f(x,y) is 
continuous on a closed set D C (p{Li) that has area; then 

JJ f{x,y)dxdy= JJ f{x(s,t),y(s,t)) 


d(x,y) 

d(s,t) 


dsdt. 
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Proof. By Theorem 8.35, page 305, f{x,y) is integrable on D. By Theorem 9.8, 
S = <p^ x {D) has area and the function 


o{s,t) 

is bounded and continuous on S = 1 (/)), so it is integrable there. To prove that 

the two integrals in the statement of the theorem are equal, we show they differ by 
less than any preassigned e > 0. 

Let (] s be an arbitrary integration grid whose cells Q t , i = 1 ,... ,7, partition S, 
and let 9fo = < p(As') ^ le image grid; its cells P, = (p{Qi ) partition D = <p(S). 
Let (si,tf) be a point in Q t , and let (x,,y,) = (p(sjfi) be the corresponding point 
in Pj. Consider the following, obtained by applying the triangle inequality to a rather 
lengthy telescoping sum: 


JJ D f(x,y)dxdy- JJ^f{<p(s,t))\J,p(s,t)\dsdt 

[[ f{x,y)dxdy~Y,f(xi,yi)A{Pi) 

JJd “i 

// f(<P{Si,ti))\Jq,(s,t)\dsdt 

(=1 

S [[ (/(P( 5 ;> f 0)-/(<P(-M))) \Jq>(s,t)\dsdt 
i= 1 J J Qi 

S JJ Q A<P (V)) I Jq> 0,01 dsdt - JJ s A<P 0,0) Mp 0,01 dsdt 


Now consider each of the four terms on the right-hand side of the inequality. 

The first term contains a Riemann sum for / on D and the grid Jtjj. Because / is 
integrable, there is a «5i >0 that makes that term less than e/2 whenever 11 dfo \| < 
and (xi,yi) is an arbitrary point in 7). As we saw above (p. 349), we have \\dfo\\ < <5i 
when || (f s \\ < 82 for some properly chosen 82 > 0. 

The second term is zero by Theorem 9.8: 


A(P i )=A(<p(Q i )) = Jj o \J 9 {s,t)\dsdt, 


i = L 


, 7 . 


For the third term, first note that 


X [[ {f{9isi,ti))-f{9{s,t)))\J v {s,t)\dsdt 
i= 1 •' ■' Qi 

< ff o I A<P(si,ti)) -f{<p{s,t)) I \j v {s,t)\dsdt. 
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Local area 
magnification 


Because f(<p(s,t)) is uniformly continuous on S, there is a £3 > 0 for which 

e 


(5,0-(40II <53 


l/(fl>M)-/(fl>(40) < 


2A (Z>)' 


Therefore, if Cj s is any partition of S for which || Q s \\ < 83 , then 
X [[ \f( 9 (si,ti))-f(v{s,t))\\J v (s,t)\dsdt 

tJL V^AWsdt = JJ s lMs,,)\ 




| dsdt = —. 


The last equality in this chain is provided by Theorem 9.8: 


jJjJ v (s,t)\dsdt=A(<p(S))=A(D). 


The fourth term, like the second, is zero. Therefore, the two integrals differ by 
less than e whenever the partition C} s satisfies || ( 3 s \\ < min^, ^ 3 . Because e > 0 is 
arbitrary, the two integrals must be equal. □ 


Because | detZ| is the area magnification factor for a linear map L of the plane, 
we have 


Am) 

A(S) 


j detZ| 


for any subset of the plane that has area. For the nonlinear map <p, we introduce the 
set function (cf. pp. 310-312) 


M(S)=A(<p(S)) = jj\MsA\ 


dsdt 


By Theorem 8.39, page 312, the derivative of M is 


M\s,t) = \J<p(s,t) I 


In other words, if S contains the point ( 5 , 1 ), then 


A<P(S)) 
A(S ) 


\Jq>(s,t)\ 


as closely as we wish by making the diameter of S (p. 291) sufficiently small. It is 
in this sense that we consider 


\M S A\ = l d etd<p ( ^)| 

to be the local area magnification factor for <p when area is understood to be 
nonnegative Jordan content. 
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9.4 Orientation 

In the next section, we introduce Green’s theorem as an additional tool to evaluate 
double integrals. However, the integrals in Green’s theorem are oriented. In this 
section, therefore, we say what it means for a 2-dimensional region to be oriented, 
and then define oriented double integrals. Finally, we extend the change of variables 
formula to oriented integrals. 

Orientation in the plane involves, either explicitly or implicitly, comparison with 
the coordinate axes (or with the standard basis vectors ei, e 2 that determine them). 
Consider first an ordered pair of linearly independent vectors {vi,V 2 } in M 2 . To 
compare {vi,V 2 } with {ei,e 2 }, let L : R 2 —*• R 2 be the unique linear map for which 
Z(ei) = vi, Z,(e 2 ) = V 2 . Then we say the pair {vi,V 2 } has the same orientation as 
{ei,e 2 } if detZ > 0; otherwise, we say it has the opposite orientation. In particu¬ 
lar, if we reverse the order of the vectors, orientation is reversed, as well: {vi,V 2 } 
and {v 2 ,vi} always have opposite orientations. We write {v 2 ,vi} = —{vi,V 2 }. The 
components of Vi and V 2 with respect to the standard basis determine the orientation 
of {vi, V 2 }. That is, if 



then the matrix of Z in terms of the standard basis is 

( Vn V\2 \ 

, anddetZ, = V 11 V 22 —V 12 V 21 . 

V21 v 2 2 / 

We also say that an ordered pair that has the same orientation as the standard basis 
is positively oriented; otherwise, we say it is negatively oriented. The figure in 
the margin helps make the point that orientation is always determined by reference 
to the coordinate axes: it is relative, not absolute. The pair {vi,V 2 } illustrated is 
positively oriented. 

The figure also shows that the ordering of a pair of linearly independent vectors 
implicitly defines a sense of rotation, namely, rotation from the first to the second 
through the smaller of the two angles determined by the vectors. “Sense of rota¬ 
tion” therefore gives us a second way to represent orientation. For example, we can 
confirm that the pair {vi,V 2 } in the margin figure is positively oriented because it 
defines the same clockwise sense of rotation as the basis vectors. 

To orient a region S with area in M 2 , orient each point p of .S' by assigning to p an 
ordered pair of linearly independent vectors {vi(p),V 2 (p)} in such a way that both 
vi (p) and V 2 (p) vary continuously with p over S. To indicate that S has acquired an 
orientation, we write it as S. (On page 7, we introduced a similar notation for curves: 
C denotes a curve C together with an orientation.) There are now two different defi¬ 
nitions of orientation when S is a parallelogram, but they agree if we assign to each 
point of v A w the ordered pair {v, w}. For each point p in S, let Z p : R 2 —> R 2 be the 
unique linear map for which Z p (e,) = v;(p), i = 1,2. Then, by what we said above, 
the function 


Orientation of an 
ordered pair of vectors 


ky 

v 2 



v ej 


Positive and negative 
orientations 


Orientation and 
sense of rotation 


Orienting a region 
in the plane 




354 
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Sense of rotation 
at every point of S 


Positive and negative 
orientations 


detZp = vii(p)v 22 (p)-v 1 2 (p)v 2 i(p) 

varies continuously with p and is never zero. We write —S to denote S with 
its orientation reversed at every point. That is, if {vi (p), V 2 (p)} defines S, then 
-{vi(p),v 2 (p)} = {v 2 (p),vi(p)} defines -S. 

Theorem 9.12. On any pathwise-connected component of S (i.e., a largest subset in 
which any two points can be joined by a continuous path in S), all points have the 
same orientation. 

Proof. Let p and q be joined by the continuous path s(t), a<t < b, with s(a) = p 
and s(6) = q. Then detL s p) is a continuous and nonzero function of t on the interval 
[a,b\, so it cannot change sign. Therefore {vj (p),v 2 (p)} and {vi (q),v 2 (q)} have 
the same orientation. □ 



In the figure, S has two components, one with positive orientation and the other 
with negative. As the figure suggests, the orientation of S always defines a “sense 
of rotation” at each point of S, and all points in any connected component of S have 
the same sense of rotation. 


Definition 9.3 Two assignments p r-> {wi(p),w 2 (p)}, p {vi(p),v 2 (p)} define 
the same orientation ofS if the unique linear map M p (v,(p)) = w,(p), i= 1,2, has 
dct :V/p > 0 for all p in S. 

Two different assignments of ordered pairs of vectors define the same orientation 
precisely when they give the same sense of rotation. Thus, each component of a 
region 5 can be given exactly two orientations, either agreeing or disagreeing with 
the sense of rotation of the coordinate axes. If S has k components, it has 2 k possible 
orientations. We say S is positively oriented if all its components are positively 
oriented, and is negatively oriented if all its components are negatively oriented. 
Usually, S has just one component. 



x 
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If dS is a piecewise-smooth curve, then an orientation of S induces an orientation 
of dS. To see how, let S be oriented. As the previous definition indicates, there is 
considerable freedom in choosing the orientation vectors at a point. Thus, orient 
each point of dS where it is smooth by the pair of vectors {n,t}, where n is the 
outward-pointing unit normal and t is one of the two unit tangents to dS, choosing 
t so {n,t} gives the local sense of rotation on that component of S. In the figure 
above, the pair {n,t} is oriented in the clockwise sense on one component and in 
the counterclockwise sense on the other. The choice of a tangent vector orients a 
piecewise-smooth curve. We use the tangent vector t to define the orientation of dS 
induced by S. 

A map can also induce an orientation of its image. Let Q. be a bounded open 
set in R 2 , and let <p : Q —> R 2 be a continuously differentiable map that has a 
continuously differentiable inverse <p 1 : <p(Q) —> Q. Suppose S is an oriented set 
that has area and its closure S = SUdS lies within Q. Suppose the ordered pair 
{vi(p), V2(p)} defines the orientation of S at the point p. Then we orient the point 
<p(p) in (p (S) with the ordered pair of vectors 

{d<p p (vi(p)),d<p p (v 2 (p))}. 

Because <p is invertible, each image point <p(p) is assigned only one such pair. Be¬ 
cause d«p p is invertible, the image vectors are linearly independent. Finally, because 
<p is continuously differentiable, the assignment varies continuously with (p i p). 

In Chapter 4 we first observed informally that the sign of the Jacobian of a map 
determines whether it preserves or reverses orientation. Now that we have defined 
the orientation of a region, we can state this observation as a theorem and prove it. 

Theorem 9.13. The regions S and cp(S) have the same orientation if and only if the 
Jacobian det d<p p is everywhere positive. 

Proof. Suppose the ordered pair {vi(p),v 2 (p)} defines the orientation of S at p; 
then {dq>p(vi(p)),dq> p (v 2 (p))} defines the induced orientation of (p(S) at <jp(p). 
Let 

L p (e,-) = Vj(p) and M p (e,) = dq» p (v,-(p)), i = 1,2, 

define the linear maps that determine the orientations. Then <p(S) has the same ori¬ 
entation at <p(p) that S does at p if and only if the determinants det M p and det/ Jp 
have the same sign. But 

M p = d<p p o Lp. det M p = detdtjpp delL p . 

so detMp and detL p have the same sign if and only if detd<p p >0. □ 

We can now introduce oriented integrals, that is, double integrals defined over 
oriented regions. We begin with a closed, bounded, and unoriented subset S of the 
(x,y)-plane. Assume S has area and f{x,y) is a function that is integrable over S; 
then we have the ordinary Riemann integral 


Orientation induced 
on a boundary 


Orientation induced 
on an image 


Orientation and 
the Jacobian 


Oriented integrals 
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Properties of 
oriented integrals 


dydx = —dxdy 


JJ s f(x,y) dA 

as defined in Chapter 8.3. This integral is monotonic in the sense that f(x,y ) > 0 
on S implies I> 0 (Theorem 8.28, p. 298). 

Definition 9.4 If S has either positive or negative orientation, then the oriented 
integral of f over S is 


JJ^f(x,y)dxdy = sgn sJJf(x,y)dA, 


where sgnS = +1 when the orientation of S is positive and sgnS = — 1 when it is 
negative. 


The oriented integral uses dxdy rather than as dA as the “element of area” in order 
to help convey orientation, in a way we explain below. 

The definition has several immediate consequences. First, because —S is nega¬ 
tively oriented when S is positively oriented, and vice versa, their oriented integrals 
have opposite signs: 


JJ _ f{ x ->y) dxdy = — JJ^ f(x,y) dxdy. 

Second, an oriented integral over a positively oriented region S is monotonic: 
JJ_f(x,y) dxdy>0 if / > 0 on S. 

Third, we can define the signed area of a positively or negatively oriented region S 
as 

area S = 


JJ^dxdy = sgn SxA(S), 


where 


A(S) = JJ^dA 


is the ordinary area (i.e., the Jordan measure) of the unoriented region S. Oriented 
area reverses sign with the orientation of the region: 

area(— S) = — area(<S). 


Writing the element of area in an oriented integral as dxdy rather than dA gives 
us the opportunity to convey differences in orientation. If we take dxdy as represent¬ 
ing the coordinate axes in their usual order (i.e., positive orientation), then —dxdy 
and dydx should both represent the opposite order (negative orientation). Let us, 
therefore, adopt the symbolic convention 


dydx = —dxdy , 
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so that 

/ Lf{x,y) dydx = - lf(x,y) dxdy = _f{*,y) dxdy 

JJs JJs JJ-s 

for any positively or negatively oriented region S. In particular, 

JJdydx = — JJdxdy = — areaS. 

It is important to note that the sign change when switching from dydx to dxdy 
in an oriented integral does not happen when we reverse the order of integration in 
iterated integrals (Corollary 9.2, p. 321). For example, if we integrate f(x,y) over 
the rectangle R.a<x<b,c<y<d and assume R is negatively oriented, then 

JJ R f{x,y) dxdy = -JfXx,y) dA 

n b rb rd 

f( x ,y) dxdy=- / f(x,y) dy dx. 

J a J c 


As an example, let us compute the oriented integral of f(x,y) = x 2 over the tri¬ 
angle S with vertices (0,0), (0,1), and (1,1), taken in that order. The boundary path 
indicates that S is negatively oriented. Because f{x,y) > 0 on S, we therefore expect 
the value of the integral to be negative. We first express the oriented integral as an 
ordinary (unoriented) double integral, and then convert that to iterated integrals. For 
the last step, we can describe the unoriented set S by either of the following sets of 
inequalities: 

5: 

Using the first set, we have 


dx 


0 < x < 1, 

s . 0<y<l, 

x < y < 1; 

0 < x < y. 

r „ r 1 

(A \ 

x 2 dA = — j 

/ x l dy dx = 

s Jo 

\Jx ' J 

(x 2 — x 3 ) dx = 

-( L --)=- 

\ 3 4 


The second set leads to the same result. 

In the oriented form of the change of variables formula for single integrals, 

rb rV l {b) 


r° r<p w 

/ f{x)dx = / f(<p(s))<p {s)ds 

Ja J(p 1 (a) 


the sign of the Jacobian (p'(s) indicates whether the interval from a to b and the 
interval from (p _1 (a) to <p -1 (b) have the same or opposite orientations. Here is the 
analogous formula for oriented double integrals. 

Theorem 9.14 (Oriented change of variables). Let LI be a bounded open set in 
R 2 , and let (p : Q. —> R 2 : (s,t) —> (x, v) be a continuously differentiable map that has 


Order in 
iterated integrals 


Example 



Change of variables 
with orientation 
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Summary 


Example 1: areas in 
curvilinear coordinates 


a continuously differentiable inverse (p 1 : <p(£2) — » Q.. Suppose the function f(x,y) 
is continuous on D C <p{fl ), a closed, oriented, andpathwise-connected region that 
has area; then 


JJ f{x 1 y)dxdy= JJ f(x(s,t),y(s,t))^^dsdt. 

D 

Proof. Because D has only one pathwise-connected component, all of its points 
have the same orientation (Theorem 9.12), so sgn(ZJ) is defined. Furthermore, the 
Jacobian J v cannot change sign on If so all points of<p 1 (Z>) have the same orien¬ 
tation, and sgnq> _1 (D) = sgnJ^-i sgnZ) = sgnJ^ sgnZ). Thus (using dA Xy) , and dA Syt 
to denote the unoriented elements of area), 

JJ f{x,y)dxdy = sgn D JJf(x,y)dA x _ y 

D D 

= sgn D JJ f(x(s,t),y(s,t)) 

V~ l (D) 

C C d (x 

= sgnZ) sgnJ,p If f(x(s,t),y(s,t)) dA Syt 

= sgn (p~\D) JJ f(x(s,t),y{s,t))j^JdA Syt 

= JJ f{x{s,t),y(s,t )) dsdt. □ 


d(x,y) 


d(s : t) 


dA Syt (Theorem 9.11) 


The following summary points out parallels between the ways that elements of ori¬ 
ented single and double integrals transform under a change of variables (7 is an 
oriented interval on the x-axis): 


x(s) 

dx - 

dx 

+ ~r as, 
ds 

I- 



dxdy - 

d{x,y) j ^ 

■* n/ t \ dsdt l 

D- 



To illustrate the use of the oriented change of variables formula, we first compute 
the signed area of a curvilinear quadrilateral specified by curvilinear coordinates 
(s,t) i— > (x(s,t),y(s,t)) in the (x,y)-plane. Let Q be the infinite strip in the ( s,t)~ 
plane given by — n/2 < s < n/2, and let (p : Q. —> R 2 be 
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| x = sins cosh?, 
I v = coss sinh?; 


coss cosh? sins' sinh? 
( s ,t) sins sinh? cosscosh? 


Everywhere on Q, (p is orientation-preserving: 

d{x,y) 


d(s,t) 


= cos 2 s cosh 2 1 + sin 2 s sinh 2 1 = cos 2 s + sinh 2 1 > 0. 


Note: cosh 2 ? = 1 +sinh 2 ?). In fact, (p is a conformal map (Definition 4.2, p. 118) 
that “flares out” Q. in such a way that the image of the vertical line s = constant lies 
on the hyperbola 




sin 2 s 


= 1 . 


The focal points of this hyperbola are (±/,0), where / 2 = sin 2 s + cos 2 s = 1 (see 
Exercise 9.24). The hyperbolas for various s are thus confocal. The image of the 
horizontal line ? = constant lies on the ellipse 


y 2 


cosh 2 ■ 


sinh 2 , 


= 1 . 


'S 

Its focal points are (±/, 0), where f 2 = cosh 2 ? — sinh 2 ? = 1. Thus the ellipses and 
hyperbolas are all simultaneously confocal; the image of Q. is the entire plane minus 
the two rays |x| > 1 on the x-axis. 

r l 0) 


(p 



Let D be the positively oriented curvilinear quadrilateral in the (x,y)-plane 
bounded by 


X 

T 

— 1 

X 

L 

y~ 

= 1, 

sin 2 a 

cos 2 a 

1 5 

cosh 2 c 


sinh 2 c 

x 2 

v 2 

— 1 

x 2 

L 

v 2 

= 1, 

sin 2 b 

cos 2 b 

1 5 

cosh 2 d 


sinh 2 d 


where 0 < a < b < n/2 and 0 < c < d. The rectangle (p 1 (D) is also positively 
oriented, and 
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Example 2 



2 


1 3 ~5 

-2 I—.- 


area D = 


JI dxdy = // <c 


n d 

( 


(cos 2 ^ + sinh 2 1) dsdt = 

After some standard calculations (see Exercise 9.25), we find 


cos 2 v + sinh 2 1) dsdt. 


- sm2b— sin2a sinh2(f — sinh2c 

area D= (d — c) ---h (o —a)---. 


For a second example, let us find 


JIJ 


(■ x-y ) 2 


+x+y 


dxdy, 


where D is the rectangle with vertices (1,-1), (2,0), (0,2), and (—1,1), taken in 
that order. Thus D is positively oriented; as an unoriented set, it is given by the 
inequalities 

0 < x+y < 2, 


D : 


—2 < x —y < 2 . 


The form of the integrand and the expressions in these inequalities suggest that we 
set 


-l . 


J s = 1 +x+y, 

d(s,t) 

1 1 

\t=x-y- 

d{x,y ) 

1 -1 


= - 2 . 


Then (f> 1 (. D ) is a simple rectangle with sides parallel to the coordinate axes: 


1 <s<3, 
—2 < t <2. 


Because the Jacobian is negative, (p 1 and (p both reverse orientation, so <p 1 (. D) 
has negative orientation. To apply the change of variables formula, we need only 


1 


d(x,y) = _ 

d(s,t) d(s,t) 

d{x,y) 


(Corollary 4.13, p. 138). Therefore, 

12 


SST0Ty dxdy= JJj{-'l) dsd, = + 1 2 // 1 U - 

D (p~ l (D) •P-'I-D) 

In the last equality, we convert the oriented integral over (p 1 (D) into an ordinary 
double integral over the unoriented set (p 1 (/b); the sign change occurs because 
<p 1 (Z)) is negatively oriented (Definition 9.4). To complete the computation, we 
write 
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<p-i(D) 


1 

2 



8 In 3 
3 


Our third example involves a similar integral, 



-yf(l+2y} 
+x+y 2 


Example 3 


D is the negatively oriented region that satisfies the equalities 

0 < x+y 2 < 4, 0 <y < x + 2. 

The integrand is positive on D, but D is negatively oriented; therefore we expect the 
value of the integral to be negative. Guided once again by the form of the integrand 
and the shape of D, we change coordinates with 


<P 


l . 


5 = 1 +X + V 2 , 

t =x-y; 


d(s,t) 1 2 y 
d(x,y) 1 -1 


-(1 + 2 y ). 


Two of the factors in the integrand transform readily, but 1+2 y has no sim¬ 
ple expression in terms of s and t (but see Exercise 9.27). However, because 
d(x,y)/d{s,t) = —1/(1 +2 y), we find 



(1 +2 yjdxdy 


( 1 + 2 y) 


\ } dsdt = —dsdt 

d(s,t) 


Therefore, because <f> 1 is invertible on I + 2y > 0 (see Exercise 9.27), the change 
of variables formula is valid and we have 


// 


-y)Hl+2y) 
+x+y 2 



<p~\D) 




where the image (p 1 ( D) is the positively oriented trapezoid defined by I < .v < 5, 
—2 <t <s — 1. Using iterated integrals and simple antiderivatives, we find 
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Essential factors 
in examples 


Local area multiplier 
with orientation 


t 2 f 5 f s ~ x t 2 

— dsdt = — / / — dt ds = — 

S J 1 J-2 S 

Notice that the factor 1 + 2 v was crucial; without it, the integral would not have 
been so easy to transform. In the integrals 

/(9 +y+y 2 fdy and J (9+y+/) 47 (l +2 y)dy, 

the same factor 1 + 2 y plays the same role; the typographically simpler integral 
on the left is mathematically more ponderous. When examples are contrived for 
instructional purposes, they include such essential factors. 

In Chapter 4, we defined the local area multiplier (or magnification factor) of a 
nonlinear map <p : £2 —> M 2 at a point (a. b) to be the area multiplier of the linear 
approximation d(p [a m at that point (Definition 4.4, p. 115). In the previous section 
of this chapter, we justified that definition, at least up to sign. However, using the 
notions of orientation and signed area, we can now remove the sign restriction. 

Theorem 9.15. Suppose (p(s,t) is continuously differentiable on an open set U, and 
the Jacobian Jq,(a,b) f 0 at some point (a,b) in U. Then 

area q>(S) r , 

--» Jq>(a,b) 

area S 

as the diameter 8(S) — > 0, where the limit is taken over closed oriented sets S that 
have signed area and contain the point ( a , b). 

Proof. We show that the limit of the quotient of signed areas is the derivative of the 
ordinary set function (cf. pp. 310-312) 

M{S) = Jfj 9 (s,t)dA, 

where 5 is S without its orientation. 

The inverse function theorem (Theorem 5.2, p. 169) implies there is a smaller 
open set Q C U containing (a. b) on which <p has a continuously differentiable in¬ 
verse. Because 8 (S) —> 0 in computing the limit of the quotient of signed areas, it is 
sufficient to restrict S to closed oriented subsets of £2. 

By taking £2 to be pathwise-connected, we can guarantee that the Jacobian 
Jq>{s,t) has constant sign on £2. Hence, because S has signed area, the same is true 
of the image <p(S), and we can write (cf. p. 356) 

area <p(S) = sgn q>(S) A(q>(S)), 

where area D denotes the signed area of the positively or negatively oriented re¬ 
gion D (p. 356). From Theorem 9.8 and the fact that sgn.7,p is well defined on 5, we 
get 



r 5 1 3 
J 1 35 


-2 


52 7 

ds = -In 5. 

9 3 
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A(<p(S)) = JJ^\J(p(s,t)\dA = sgnJ,p jJj (p (s,t)dA = sgn J v M(S). 

Thus, 

area cp (S) = sgn (p(S) x A(<p(S)) = sgn <p(S) x sgn./^ x M(S) = sgn S x M(5); 

sgn<p(S) sgnJ,p M(S) = sgnS follows from the proof of Theorem 9.14. We also have 
areaS = sgnS x A(S); thus it follows that 


area<p(S) 
area S 


M(S) 

A(sj 


MUpfi) =Jq,(a,b) 


for positively or negatively oriented sets S that contain (a, b) and for which d (5) —> 0 
(Theorem 8.39, p. 312). □ 

The theorem implies that areaq)(S) ~ Jq>(a,b) area5 for any sufficiently small 
positively or negatively oriented region containing the point (a,b). For this reason 
we say that the Jacobian Jq (a, b) is the local signed area magnification factor for 
the map (p at (a, b). 

The change of variables formulas we have established for double integrals and 
domains in M 2 (Theorem 9.11 and Theorem 9.14) extend naturally to triple integrals 
and domains in R 3 . We state the extensions here with the understanding that they 
can be proved by adapting the proofs of the 2-dimensional versions. To help under¬ 
score the distinction between the oriented and unoriented cases, we use dV as the 
unoriented element of volume. 


Theorem 9.16 (Change of variables in R 3 ). Let Q. be a bounded open set in R 3 , 

and let <p : Q —> R 3 : ( r,s,t ) —> (x,y,z) be a continuously differentiable map that 
has a continuously differentiable inverse (p 1 : (p(Ll) — > Q. Suppose the function 
f(x,y,z) is continuous on D C <p(£2), a closed region that has volume; then 


JJJ f(x,y,z)dV x ^ z = JJJ f(<p(r,s,t)) 

D <P~ l ( D ) 


d(x,y,z) 

d(r,s,t) 


dV r . Sit . 


□ 


Theorem 9.17 (Oriented change of variables in R 3 ). 

Let Q. be a bounded open set in R 3 , and let (p : Q. —> M? : (r,s,t) —> (x,y,z) be a 
continuously differentiable map that has a continuously differentiable inverse (p 1 : 
<p(Q) —> £2. Suppose the function f(x,y,z) is continuous on D C <p(Li), a closed, 
oriented, andpathwise-connected region that has volume; then 

JJJ f(x,y,z)dxdydz = JJJ f(<p(r,s,t)) drdsdt. □ 

D (z>) 


area<p(5) ~ area S 


Changing variables 
in triple integrals 








364 
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A special case of 
Green’s theorem 


y = 8(x) 



y = Y(x) 


9.5 Green’s theorem 


Green’s theorem equates the double integral of a certain function over an oriented 
region in the plane to the path integral of a related expression over that region’s 
boundary (with its induced orientation). Each integral can be used to evaluate the 
other. We state and prove several increasingly more general versions of Green’s 
theorem, and then use the final version to extend the change of variables formula 
for oriented double integrals to settings in which the change of variables may not be 
invertible. 

The first version of Green’s theorem is a special case involving a positively ori¬ 
ented region S that can be described in both of the following ways: 

s. a<x<b , s. c<y<d , 

y(x)<y<8(x); ’ a(y) <x< /3(y); 

we assume y(x) and 8(x) are continuous functions ofx on [a,b], and a(y) and f}(y) 
are continuous functions of y on [c,d\. The orientation on S induces (p. 355) an 
orientation on dS. 


Theorem 9.18 (Green’s theorem). Suppose P{x,y) and Q(x,y ) are continuously 
differentiable functions defined on the closure of the region S, and dS has the orien¬ 
tation induced by S; then 

(f Pdx + Qdy= If dxdy. 

JdS " JJs \dx dy J 

Proof. We assume S has positive orientation, and prove half of the equality, 




using the first description of S. The other half, involving Q and dQ/dx, is done in a 
similar way using the second description of S. 

First consider the path integral. As the figure indicates, we can partition the ori¬ 
ented path dS into four segments (or fewer: either vertical segment C 2 or C 4 may 
reduce to a point). Neither vertical segment contributes to the path integral, because 
x is constant and dx = 0 there. Consequently, 



f Pdx+ [ . 
Jci Jc 3 


Pdx. 


On Ci and C 3 we can use x itself as the parameter (but make x “run backwards” 
from b to a for C 3 ); then 
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k 

k 


Pdx = 


Pdx = 



dx , 



dx, 


and hence ^ 

j) Pdx = j (P(x, y(x)) — P(x, 5(x))) dx. 

That is, the path integral reduces to an ordinary single integral. We now show that 
the double integral reduces to the same ordinary single integral: 


ff dP f 0 f°\ x ) dP 

!h-Ty dxdy = l. /*,, -Ty {x ’ y)dydX 

rb S( x ) rb 

= / — P(x,y) dx= (P(x,y(x)) — P(x,8(x))^ dx. 

Ja Ja 


This completes half the proof; use a similar argument with Q and with the second 
description of S to prove the other half. □ 

Below we consider how Green’s theorem can be used as a tool for evaluating 
double integrals. More commonly, though, it is a tool for evaluating path integrals, 
and we consider this use first. 


Corollary 9.19 Suppose P = p(x) is a function ofx alone, and Q = q(y) a function 
ofy alone; then 


(p /?(*) dx + q{y) dy = 0. 
J dS 


Proof (Q x — P v = 0, so jjkQx ~ Py ) dxdy = 0. □ 

Recall that &(x,y) is called a potential (cf. p. 25) for the vector field F(x,y) = 
(. P(x,y),Q(x,y )) if F = gradth; that is, ifP= dO/dx, Q = d@/dy. 

Corollary 9.20 Suppose the vector field (P(x,y), Q(x : y)) has a potential @(x,y) 
that has continuous second derivatives on S; then 


Evaluating the 
path integral 


Potential functions 



Pdx+ Qdy = 0. 


Proof. Q x ~Py = ‘Pyx — < &xy = 0 on .S when O has continuous second derivatives 
on S. □ 

The second corollary is a generalization of the first when p(x) and q{y) are contin¬ 
uously differentiable, because then we can take 

0(x,y)= J p(x)dx + J q[y)dy. 
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Evaluating 
double integrals 


Example 



The indeterminacy of 
partial integration 


Green’s theorem can be used to evaluate a double integral by reducing it to a path 
integral. Specifically, given 

JJ^f(x,y)dxdy, set F{x,y) = Jf(x,y)dx. 

That is, F is a “partial integral” of f(x,y) with respect to x, which means only that 
dF/dx = f. If we now take P(x,y) = 0 and Q(x,y ) = F(x,y), then Green’s theorem 
gives 

JJ_f(x,y) dxdy = J F(x,y) dy. 

We can even write this as 


JJ s A x ,y)dxdy = J (J f(x,y)dxj dy, 


a kind of iterated integral in which one of the iterates is a path integral. 
To illustrate, let us compute (cf. the example on p. 357) 


JJx 2 dxdy 


where S is the positively oriented triangle with vertices (0,0), (1,1), and (0,1). We 
have 

x 2 dxdy = ® ( x 2 dx) dy = — dy. 

JJs JdS J ' Ids 3 

The path dS has three segments, but the path integral vanishes along two of them: 
on the top, dy = 0; on the vertical side, x = 0. On the diagonal side, we can use x=y 
as the parameter, so 

Incidentally, when we convert the double integral of f{x.,y) over S into the path 
integral of 

F ( x ,y) = J f(x,y)dx 

over dS, the partial integral F(x,y ) is determined only up to an additive “constant” 
with respect to the integration variable x. Such a “constant” is, in fact, an arbitrary 
function ofy. However, this indeterminacy has no effect on the outcome. Adding an 
arbitrary function ofy to the partial integral x 2 /3 in the last example, we find 

JJx 2 dxdy = J -(^J x 2 dx^j dy = J ^— + q(yjj dy = J — dy, 


because Corollary 9.19 implies 
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i 


q(y)dy = o. 


We turn now to the task of extending Green’s theorem to more general oriented 
domains S. Our first step in this direction is to assume that S no longer admits both 
descriptions that we use in computing iterated integrals, but only one of them. For 
example, suppose we know only that 

* a<x<b , 
y(x)<y<S(x). 

Then our earlier proof of the equality 


<£ Pdx= / / —— dxdy 
JdS JJs ay 

still holds, but we do need a new proof of the second half of the theorem, 

l.Qdy= ff^-dxdy, 

JdS JJs OX 


because that depended on the now-absent second description of S. 

To construct a new proof, let F be a “partial integral” of Q with respect to y: 

F(x,y) = J Q{x,y)dy or F y {x,y) = Q{x,y). 

Because we assume that Q has continuous first derivatives, F has continuous second 
derivatives, and Q x = F yx = F xy . We can express the double integral of Q x in terms 
of F: 


rr r h rS(x) rb 

/ LQx(x,y)dxdy = / / F x Jx,y)dy dx= / F x (x,y) 

JJS Ja Jy{x) Ja 

= [ (F x (x,8(x))-F x (x,y(x)))dx. 

J a 


5{x) 

j(x) 


dx 


We now show that the integral of Q over the path dS reduces to the same expres¬ 
sion, making separate calculations on each of the four segments Ci, C 2 , C 3 , C 4 . On 
Ci we can use x as the parameter with y = y(x) and a<x<b: 


L Qdy= [ Q(x,y(x))Y( x)dx 
Jc t Ja 


Now consider F(x,y(x))', because 


d 


Green’s theorem for 
more general regions 


y = d(x) 



y = y(x) 
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the chain rule and the defining condition F y = Q give us 


rb rb rb 

/ Q(x,y(x))y'(x)dx= / —F(x,y(x))dx- / F x (x,y(x))dx. 
Ja Ja wl J a 


The first integral on the right equals F(b, y{b)) — F{a , y(a)), so 

L Qdy = F(b,y(b)) -F(a,y(a)) — [ F x (x,y(x))dx. 

JC\ J a 

On C 2 , x = b and we can use y as the parameter with y{b) <y< b(b): 


r fS(b) fS(b) 

L Qdy= / Q{b,y)dy= / F y (b,y)dy = F(b,S{b)) - F{b,y{b)). 

JC 2 Jy(b) Jy(b) 

On C 3 , we can again take x as the parameter, but now y = 8(x) and we must 
integrate with respect to x from b to a: 

[ Qdy= j Q(x,8{x))8'(x)dx = — [ Q(x, 8(x))8'(x)dx. 

JCx Jb Ja 


Using F(x, 8 (v)) and an argument similar to the one for C\ , we find 

Qdv = —F(b,8{b))+F(a,8{a))+ I 

)c 3 


[ Qdy = —F(b, 8(b)) +F(a, 8(a)) + f F x (x,8(x))dx. 
jCt. J a 


On C 4 , x = a and we can again use y as the parameter, but must now integrate 
from 8(a) to y(a): 


L Qdy= f Q(a,y)dy = f F y (a,y)dy = F(a,y(a))-F(a,8(a)). 
JC4 J 8(a) J 8(a) 


JC 4 
Therefore, 


Mdy= /_ Qdy+ [ Qdy+ [ Qdy+ f Qdy 

JdS Jc 1 Jc 2 Jc 3 Jc A 

= / F x (x,8(x))dx— / F x (x,y(x))dx = // Q x dxdv 

Ja Ja JJS 


when all the cancellations are taken into account. 


□ 


The same arguments, mutatis mutandis, allow us to prove Green’s theorem for a 
region S when we have only the single description 


S: 


c <y<d, 
a(y) <x<p(y). 
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Our final version of Green’s theorem uses oriented regions S that are finite unions Green’s theorem on 
of the two kinds we have already considered. In particular, we allow the boundary more general domains 
dS to have more than one component, although each component will still have the 
orientation induced by S. Thus (cf. p. 355), if n is the outward-pointing unit normal 
at any smooth point of dS, then the orienting unit tangent t for dS at that point is 
chosen so that the pair {n,t} agrees with the orientation of S itself. We assume, 
as in the figure below, that S is closed, bounded, and oriented, and that it can be 
subdivided into a finite number of nonoverlapping closed cells S\,... ,Sy with the 
same orientation as S. As the figure suggests, this can often be accomplished with 
properly placed vertical or horizontal lines. 



To prove that Green’s theorem holds on S, consider separately the double integral 
and the path integral. For the double integral we have 

JJ^iQx- Py) dxdy = JJ^ (Q x - P y ) dxdy -|-h (Q x ~ P y ) dxdy 

immediately, by Theorem 8.27, page 298. 

The path integrals combine in a more interesting way. If two cells S, and Sj have 
a boundary segment C in common, then their outward normals point in opposite 
directions on C, because Si and Sj are on opposite sides of C. Therefore, the orienta¬ 
tion of C as part of dS, is opposite its orientation as part of dSj, so the contributions 
that C makes to 

® _ Pdx + Qdy and ® _ Pdx + Qdy 
J dSj J dSj 

exactly cancel. The only contributions that do not cancel are from those boundary 
segments C that S, shares with S itself. By construction, the orientation of C as part of 
dSj is the same as its orientation as part of dS. Therefore, after all the cancellations 
are taken into account, 


Combining the 
double integrals 


Combining the 
path integrals 


® Pdx + Qdy= ® _ 
JdS J dS\ 


Pdx “F Qdv H-1- 


® ^ Pdx + Qdy. 

J dSw 
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Change of variables 
via Green’s Theorem 


Thus, because Green’s theorem holds for each Si, it holds for S: 

JJ_ (Qx — Py ) dxdy = j) Pdx + Qdy. 


□ 


Of course, if the orientation of dS is opposite the orientation induced by the 
orientation of S, then 


j> ^Pdx+ Qdy = — jj " (Q x — P y ) dxdy. 


Using Green’s theorem, we now establish yet another version of the change of 
variables formula. As with the previous version (Theorem 9.14), this one applies to 
oriented integrals. For that reason, it uses the Jacobian itself, rather than its abso¬ 
lute value. Unlike the previous version, it does not require the map f that changes 
variables to be 1-1. Therefore, because we no longer assume there is a 1-1 corre¬ 
spondence between points of S and points of T = f(S), we cannot assume that an 
orientation of S induces (cf. p. 355) an orientation of T. 

Thus let S and T = f(S), and suppose S and T are independently oriented regions, 
with Green’s Theorem holding on each. Assume dS and d 7 are simple, piecewise- 
smooth closed curves. The image f (dS) need not be simple; instead, we assume 
only that f (dS) C. dT as sets, that dS and f (dS) have a common decomposition into 
smooth oriented curves, and 

f {dS)=kdT 

as oriented paths. The integer k counts the number of times f wraps dS around d T in 
the positive direction, minus the number of times it wraps in the negative direction. 
In the figure below (where the images have been separated for clarity), k = +1. 

O* 

To compensate for the possibility that f is not 1-1, we require that it now have 
continuous second derivatives. 

Theorem 9.21 (Change of variables with Green’s theorem). 

Suppose f (s,t) = (x(s,t),_y(.s,t)) has continuous second derivatives on a bounded 
open set Q. in R 2 . Let S C Q and T = f(5) be closed, independently oriented sets 
whose boundaries dS and dT are simple closed curves. Assume that Green’s theo¬ 
rem holds for both S and T, that dS and i{dS) have common decompositions into 
smooth oriented curves, and that f(dS) = k dT as oriented paths. Then, for any 
continuous function g(x,y) on T, 

k JJ^g(x,y)dxdy = JJ_g(x(s,t),y(s,t)) dsdt. 
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Proof. Because Green’s theorem holds for the region T , we can write 

JJ f g{x,y) dxdy = j> G(x,y) dy, 

where G(x,y) is some function for which G x (x,y) =g(x,y ) (i.e., a “partial integral”). 
Because kdT = f (dS), we have 

k JJg{x,y) dxdy = k j> G(x,y) dy=j> G(x,y) dy = £ G(x,y) dy. 
f d{T) kdT f(dS) 

Now apply Exercise 4.37 (p. 149) to f to transform the last path integral: 

j> G(x,y)dy = j) G(x(s,t),y(s,t))(y s ds +ytdt) = G*y s ds+ G*y t dt. 
f (dS) 


dS 


ds 


(Here G*(s,t) = G(x(s,t),y(s,t))-) Applying Green’s Theorem a second time, we 
transform this new path integral over dS back into a double integral, but now one 
over S: 

j> G*y s ds + G*y t dt = jj^ {(G*y t ) s -(G*y s ) t ) dsdt. 

The terms in the new double integral are 

( G*y t ) s = ^ (G{x(s,t),y(s,t))-yt(s,t)J = ( G*x s + G*y s )y t + G*y ts 


and 


( G*y s ) t = ^ (G(x(s,t),y(s,t)) ■y s (s,t )) = (G*x t + G*y t )y s + G*y st . 

Therefore, 

{CTyt), - {G*y s )t = G* (x s y t -x t y s ) + G* (y ts -y st ). 

The second term vanishes because y ts —y s t = 0; this is where we need the hy¬ 
pothesis that the map f has continuous second derivatives. Finally, because G* = 
g(x(s,t),y(s,t)) and x s y t —x t y s is the Jacobian of f, we have 

JJ S (( G*y t ) s - (G*1Ks)<) dsdt = JJ^g(x(s,t),y(s,t)) dsdt. □ 


For our first example of a noninvertible change of variables, we use the quadratic 

map f: R 2 —> R 2 , 


f: 


I x = s 1 — t 2 , 
\y=2st, 


J{sf) 


d(x,y) 2s —21 
d(s,t) 21 2s 


4 (s 2 +t 2 ), 


Example 1: 
the quadratic map 
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that we analyzed in Chapter 4 (cf. pp. 116-121). We saw there that f squares dis¬ 
tances from the origin and doubles polar angles. Away from the origin, J > 0 and f 
is locally 1—1, by the inverse function theorem. 

Let S be the unit disk in the (j,t)-plane; its image T = f(<S) is the unit disk in the 
(x,y)-plane. Note that Si, the upper half of S, already covers all of T, and so does 
the lower half, Sj- The images of the boundaries of Si and S 2 are the same; however, 
in the figure below they have been separated slightly, for clarity. 


y 


f 


For the sake of illustration, let S have positive orientation and T negative. Then the 
image of dS wraps twice around dT, but with the opposite orientation: 

f(dS) = -2dT. 

For any continuous function g(x,y) on T, Theorem 9.21 asserts that 




-2 


JJ f g(x,y)dxdy = JJ^g{s 


t 2 ,2st) 4(s 2 + 1 2 ) dsdt. 


For instance, if g{x,y) = 1, then the assertion reduces to 


—2 area? = —2 


JJ dxdy = JJa^s 2 


t 2 )dsdt. 


To verify this, note that ? is a negatively oriented unit disk, so area? = — n and 
the left-hand side is +2 n. The right-hand side has the same value, as we can see by 
making a change to polar coordinates: 




Ar 2 rdr = 2 nr 4 


1 

0 


= 2n. 


For a second instance, take g(pc,y) = x 2 ; then we must verify that 


-2 


JJx 2 dxdy = JJSs 2 


t 2 ) 2 4(s 2 + t 2 )dsdt. 


Because T is negatively oriented, we have 
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—2 If x 2 dxdv= + 2 I f x 2 dA = 2 f [ r 2 cos 2 0 r dr d6 
JJf ' JJt Jo Jo 

/*2tT r 1 1 jr 

= 2 cos 2 Odd / r i dr = 2-n- - = —. 

JO jo 42 

The integral over S can also be evaluated by a change to polar coordinates in which 
s = rcos0, t = rsin0. Because 

(s 2 —t 2 ) 2 = (r 2 cos 2 0 — r 2 sin 2 9) 2 = r A cos 2 20, 


we find 


JJS s2 ~ t 2 ) 2 4(s 2 + t 2 )dsdt 


r2ll r\ 

/ cos 2 20 
Jo Jo 


7 1 

4r dr = n x - = 


n 

2 ' 


For our second example of a change of variables that transforms integrals using 
Green’s theorem, we take 


f: 



J{s,t) 


d{x,y) 1 0 
0 2 1 


The Jacobian J(s,t) is positive in the upper half-plane and negative in the lower; it 
changes sign on the s-axis. The map f is a fold (cf. Exercise 4.21, p. 146). It folds the 
(s,t)-plane along the s-axis; points that are symmetrically placed across the 5-axis 
have the same image. 

Let S be the rectangle 0 < .v < 1,-1 < t < I with positive orientation. Where J 
changes sign, split S into two positively oriented nonoverlapping cells Si (t > 0) and 
S 2 (.t < 0), so that we can write S = Si + S 2 - The image T = f(S) is the unit square 
0 <x < 1,0 < y < l;lfwe make 7 positively oriented, then 

T = f@i) = -f(S 2 ). 

Notice that the image of the boundary, f (dS), is a proper subset of the boundary 
off; it does not “wrap around” dT. In fact, we can write i'(/)S) as C—C, where C 
goes around three sides of T. This implies 

f(dS) = 0 x o» f 


as an oriented path. Therefore, by Theorem 9.21, 


JJ^g(s,t 2 )2tdsdt = 0 x JJg(x,y)dxdy = 0 


for any continuous function g(x,y) defined on the unit square T. 

To see, from another perspective, why the integral of any function of the form 
G(s,t) = 2 tg(s,t 2 ) over S must automatically equal zero, we note two things. First, 
G(s,t) is an odd function of t (i.e., G(s,—t) = —G(s,t)). Second, the region S is 


Example 2: a fold 
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Example 3: 
folding a 
different region 


Example 4: a pleat 


symmetric across the 5-axis: the point (s,t) is S if and only if (s, —t) is. Therefore, 
if we write ^ ^ 

g(s,t 2 )2tdsdt = ds G(s,t)dt, 

JJs Jo ./-l 

then we see that we must integrate an odd function of t over a t-interval that is sym¬ 
metric about the origin. The value of such an integral is always zero (Exercise 9.26). 

Our third example again uses the fold map, but applies it to a region on which the 
boundary condition of Theorem 9.21 fails. The region is the parallelogram 5 shown 
below. Because parts of f (dS) lie in the interior of T, f (dS) dT. Therefore, no 
matter how S and T are oriented, no integer k can be found for which 


i{dS) = kdT. 





J(s,t) 


d(x,y) 

d(s,t) 


- jf 3/ 2 — 


= 3f 2 — js. 


For reasons that soon become clear, this is called a pleat. Let S be the positively 
oriented rectangle — 1 < s < 2, — 1 < t < 1. Note that f preserves vertical lines, 
because x = c when s = c. Florizontal lines are not preserved, but each is mapped 
to some straight line. Specifically, the horizontal line t = k is mapped to the line 
y = k i —kx/ 3 (using s = x) with slope — k/3 andy-intercept k 3 . Therefore, the image 
of S is the trapezoid 

_ -1<jc<2, 

T : 

jX — 1 <y < — |x + 1 

that we define to be positively oriented. In that case, f (dS) = +1 xdT. 
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Let us compute the area of the trapezoid T first using a double integral provided 
by the change of variables formula, and then by elementary geometry. The integral 
gives 


area T = 


+ 1 x JJ ^ dxdy = jj [ (3 1 2 — js) dsdt 

j J (3 1 2 — dt ds = J t 3 — j st 

/>-b> 


ds 


— 4s)ds = 2s— \s 2 =4— | — ( — 2— f)=5. 


As a figure in elementary geometry, T has “height” H = 3 and “bases” B\ = 2j, 
2 . 


B 2 = f; therefore, we find once again that 


area T =H 


B\ +B 2 


= 5 . 


The map f has subtleties that, among other things, make the validity of the trans¬ 
formation of integrals more surprising than we might at first imagine. The family of 
horizontal lines in the figure below shows clearly that f makes a “pleat” in the tar¬ 
get. The pleat is made up of two folds that come together at the origin of the target. 
Inside the pleat, each target point is the image of three points in the source; outside, 
only one. 



To make the pleating more apparent, we have sheared the rectangle into a parallelo¬ 
gram P. This makes the image of the right edge of P follow a cubic curve that shows 
how the pleat is folded. As a result, f (dP) $£ d(f(P)): part of that cubic curve lies 
in the interior of f(P), and the image of part of the interior of P lies on d(f(P)). 
Although the boundary condition of Theorem 9.21 holds on S, it does not hold on P, 
even though the shear that changes S into P can be as slight as we wish. 

The source is folded to make the pleat. Given any point in the source near that 
fold, there is another point (on the other side of the fold) that has the same image. 
In other words, f is never locally 1-1 at a fold point. But, by the inverse function 
theorem, f is locally 1-1 near any point (s,t) where J(s,t) 7 ^ 0. Therefore, the fold 
points of f must occur where J = 0, on the parabola s = 9t 2 . The pleat is the image 


Area given by 
a double integral 


f makes a pleat 


The fold locus 
and its image 
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of this parabola; that is, the pleat is the set of points f(9 t 2 ,t). It therefore has the 
parametric equations 


x = 9 1 2 , 

y = t 3 — j9f 3 = — 2 t 3 . 


J— 0 


The pleat makes a cusp at the origin. We can see this afresh by setting t = y/y/—2, 
so that _ 

x = 9(tf^2) 2 = f 4 fl\ 

This is a function whose graph is a cusp. 

As we have seen, f is a threefold cover of the inside of the cusp. It would appear 
that calculating the area of the trapezoid f(S) by integrating over S would count the 
area inside the cusp three times, instead of just once. But note that f reverses the 
orientation of one of the three “sheets” that cover the inside of the cusp (namely, the 
inside of the parabola). Let us now see how this leads to the correct outcome. 


1 





We partition S into the four nonoverlapping, positively oriented regions shown 
above, so that S = S\ +S 2 +Sj + S 4 . On each region, f is 1-1. The change of variables 
formula implies that the area of T is 

// Jdsdt= // Jdsdt+ // Jdsdt+ // Jdsdt+ // Jdsdt , 

JJs JJsi JJs 2 JJs 3 JJs 4 

where J =3t 2 — in every case. The value of the first integral on the right is 2 j, 
the area of the trapezoid f(5i). For the second integral, we have 
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The part of f( 52 ) that lies in the first quadrant is a trapezoid whose area is 4/3; 
the remaining part, the curvilinear triangle in the fourth quadrant, must therefore 
account for the remaining area, namely 16\/2/ l 35 « 0.17. By symmetry, the fourth 
integral has the same value, and breaks down in the same way: 



4 , 16V2 
3 + 135 * 


We see that f(5i) and the trapezoidal parts of ffy) and f(&t) already completely 
cover T, and their total area is 2| + 2 • | = 5, the area of T. The integrals over Si 
and S 4 therefore produce an excess of +32\/2/135. With this in mind, consider the 
third integral: 



is) dt ds 



32n/2 
135 ' 


V5/3 

-VS/3 



_8_2 5 / 2 

135 


This is negative and exactly cancels the excess contributions made by S 2 and S 4 . 
The sum of the four oriented integrals is 5. 

Notice that although f (dS) = d 1 as sets and as oriented paths in Example 4, f is 
not a 1-1 map of <95 to dT: the image of dS “doubles back” on itself briefly inside 
the cusp. However, all we need (for the proof of the change of variables theorem to 
hold) is that f (dS) make a single traversal of T in the sense induced by T, because 
then 

P(x,y) dx + Q(x,y) dy=j) P(x,y) dx + Q(x,y ) dy 
f (dS) df 

for any continuous integrands P{x,y) and Q{x,y). 
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Exercises 


9.1. Use the pullback substitution y = \/x 2 +a 2 tan6 (here y is a function of 0; 
x is fixed) to show that 


/ 


dy 


R 


9.2. Show that 


(x 2 +y 2 + a 2 ) 3 / 2 (x 2 +a 2 )(x 2 +R 2 + a 2 )*/ 2 

1 


\/l + ia/2R) 2 

and then provide the details to show that 

R 2 


= 1 + 0{{a/R) 2 ) as a/R —> 0 


R 


a\/2R 2 +a 2 a\J2 


+ 0(a/R) asa/i?—>0. 


9.3. Determine the value of each of the following iterated integrals. (Note the dif¬ 
ferent orders “ dydx ” and “ dxdy Which integrals are equal and which are 
not? Is that what you expect? 


a. 

b. 


d. 


a: (x 2 + xy 3 ) dy dx = ai: (x 2 + xy 3 ) dy^j dx 

n (x^ +xy i )dxdy 


a: (x 2 +xy^)dxdy 

r 5 r 3 2 3 

/ / (x^ +xy i )dvdx 

J 1 70 


9.4. Evaluate: 

rl r2y +1 

a. / / xydxdy 

J 1 Jy-5 


b. 




9.5. Sketch, in the (x,y)-plane, the domain of integration of each of integrals in 
Exercises 9.3 and 9.4. 


9.6. Sketch each of the following regions in the (x,j)-plane and then describe it in 
the form 


a<x<b, y(x) < y < 8 (x). 


a. The unit circle (the circle of radius 1 centered at the origin). 

b. The circle of radius 3 centered at (5, — 1). 

c. The circle of radius R centered at (p,q). 

d. The bounded region that lies between the graphs of v = x 2 and y = 4. 
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e. The bounded region in the first quadrant that lies between the graphs y = x 2 
and x=y 2 . 

f. The triangle with vertices at (0,0), (5,5), and (0,5). 

g. The diamond-shaped (or lozenge-shaped) region whose vertices are at the 
points (1,0), (0,1), (—1,0), and (0,-1). 

h. The region shown shaded in the margin. Write P as (p,q)', you need not 
determine the values of p or q. 

9.7. Describe each of the following regions in the form 

c<y<d , a(y) < x < j3(y). 

a. The circle of radius R centered at (x,y) = ( p,q ). 

b. The bounded region that lies between the graphs of y = x 2 and x =y 2 . 

c. The triangle with vertices at (0,0), (5,5), and (0,5). 

d. The triangle with vertices at (0,0), (5,5), and (5,0). 

e. The region where 0 < x < 2, 0 <y <x. Sketch this! 

f. The region where 0 < x < 2, x 3 <y < 10 — x. 



9.8. Reverse the order of integration in the following integral: 



That is, rewrite it an iterated integral with the integration done first with re¬ 
spect to x, and then with respect toy (i.e., “dx dy ” instead of “dy dx”). 

9.9. By reversing the order of integration, show that 



e— 1 
2 


What happens when you try to determine the integral directly, without revers¬ 
ing the order of integration? 

9.10. In each of the following, reverse the order of integration and then calculate 
the new integral. 



9.11. Evaluate the given double integral using iterated integrals. 


JJ (x 2 +xy 3 )dA, 


R: rectangle with vertices (0,1), (3,1), (3,5), (0,5) 


a. 
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b. JJ xy dA, D: bounded region between the graphs v = x 2 and x = y 2 

°■ h a ii 


x dA 


-KKl 

0<v<3 


JJ X 

K : disk of radius 

II X 

K : disk of radius 

JJ xy dA, 

T : triangle with 


0<y<2 
y<x<A y 


9.12. Express, as a double integral, the volume of the solid bounded by the planes 
x = 0, y = 0, z = 0, and x +y+z = 4. Then determine the volume by evaluating 
the integral. 


9.13. The double integral 


II 


HdA , 


H constant, 


{x-p) 2 +(v-q) 2 <R 2 


y 

d 


c 



a b 


x 


z = sin(x) sin(y) 



is the volume of a familiar solid shape. Describe the shape quite precisely and 
use that knowledge (rather than an iterated integral) to determine the value of 
the integral. 

9.14. The double integral 

JJ \/R 2 —x 2 —y 2 dA 

x 2 +y 2 <R 2 

is the volume of a familiar solid shape. What is the shape? Use that knowledge 
(rather that an iterated integral) to determine the value of the integral. 

9.15. Show that 

// £ 2 -L d A=f(P)-f(Q)+f(R)-f(S), 

JJd dxdy 

where P = ( a,c ), Q = (b,c), R = (b,d), and S = (a, d) are the vertices of the 
rectangle D. 

9.16. In Exercise 9.15, take f(x,y ) = xy and a = c = 0. Then show, using separate 
observations or arguments, that the expression f(P) — f{Q) + f(R) — f(S) 
and the double integral both equal the area of D. 

9.17. The volume of the “ravioli” z = sinxsiny shown in the margin is 



sinxsiny dydx. 


Evaluate this integral using the result in Exercise 9.15. What is f(x,y ) in this 
case? 
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9.18. a. What is the mean, or average, value z (cf. Definition 3.1, p. 75) of the 
function ax + by + c on the circle x 2 +y 2 < R 2 . Does the value depend on 
the coefficients a or bl 

b. The double integral 

JJ (ax + by + c) dA 

x 2 +y 2 <R 2 

is the volume of a certain solid shape. Describe the shape and make a 
general sketch (take c > R > 0). Determine the volume directly from your 
knowledge of z, without calculating the integral again. 

c. Without calculating the following integral, explain why 


// 


(ax + by) 


x 2 +y 2 <R 2 


dA = 0, 


for R > 0 and for any a and b. 

9.19. a. (This concerns a function of a single variable, and it appears here to pro¬ 

vide a comparison with the result of the previous exercise.) What is the 
average value y of y = mx -\- b on the interval —R<x<Rl 

b. Does y depend on ml Draw a graph of y = mx + b that explains your an¬ 
swer. (To make a concrete graph, try m = 1/2 and b = 3.) 

c. Your work on this and the previous exercise should now allow you to make 
a general statement about the average value of a linear function (of either 
one or two variables) on a domain that is symmetric with respect to the 
origin. What is the statement? 

9.20. a. Show that the average value z of z = R 2 — x 2 — y 2 on the disk xr +y 2 < R 2 

is 2R/3. 

b. Sketch the graph of z = \/ R 2 — x 2 —y 2 and describe it in words. 

c. On the same axes, sketch the horizontal plane z = z. This defines a cylinder 
over the disk x 2 -\-y 2 < R 2 an d the volume of this cylinder should equal 
the volume of the solid z < V# — x 2 —y 2 over the same disk. Why? Do 
the volumes appear equal, or nearly so, in your sketch? (This fact was 
discovered by the Greek mathematician Archimedes (c. 287-212 b.c.e.); 
it implies that the volume of a ball of radius R is 4nR 2 /3.) 

9.21. (Here is another single-variable problem that is provides a comparison with 

an analogous result for a function of two variables.) 

a. Show that the average value y of y = VR 2 — x 2 on the line —R < x < R is 
nR/4. 

b. Sketch the graph of y = VR 2 — x 2 and describe it in words. On the same 
plane sketch the graph y=y. 
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c. According to the definition of y, the area of the rectangle under the hori¬ 
zontal line y = y should equal the area under the graph of_y = V R 2 — x 2 ; 
why? Does your graph show this? 


9.22. a. Show that 


JJ In \Zx 2 +y 2 dA = —n{ 1 — e 2 + 2e 2 lne)/2. 

e 2 <x 2 +y 2 < 1 

b. Show the improper integral JJ In \Jx 1 +y 2 dA converges to —n/2. 


x 2 +y 2 < 1 


9.23. Use the pullbackx = coss to show / = ln(2 + \/3). 

J 1/2 \Jx 2 —x 4 

9.24. The aim here is analyze the focal points of the hyperbola x 2 /a 2 —y^/b 2 = 1 
and the ellipse x 2 /a 2 +y 2 /b 2 = 1. 


a. By definition, a hyperbola is the locus of points for which the difference of 
the distances to two fixed points (its focal points ) is a constant. Show that 
the focal points of the hyperbola are p± = (± \/a 2 +b 2 , 0 ) in the following 
way: Parametrize the part of the hyperbola for which x > 0 as x = (x.y) = 
(acoshf,hsinhf), and setD± = ||x — p±||. Show by direct computation that 
D± = \/a 1 + b 2 cosh t =p a and thus that D — D + = 2a. 

b. Conclude that the hyperbolas x 2 /sin 2 .? — y 2 /cos 2 s = 1 (with s arbitrary) 
are confocal with focal points (± 1 , 0 ). 

c. By definition, an ellipse is the locus of points for which the sum of the 
distances to the two focal points is a constant. Show that, when a> b> 0, 
the focal points of the ellipse are p± = (± Va 2 — b 2 , 0). Adapt the aproach 
you took for the hyperbola. 

d. Conclude that the ellipses x 2 /cosh 2 5 ±y 2 /sinh 2 5 = 1 (with s arbitrary) 
are confocal with focal points (± 1 , 0 ). 

9.25. Use cos 2 s±sinh 2 t = ^(cos2.s±cosh2f) (for example) to compute 



cos 2 s ± sinh 2 1) dsdt. 


9.26. Suppose /(f) is an odd function (/(—f) = —/(f)) that is integrable on the 
interval —a <t < a. Use the change of variable f = — s to show 


/ 0 rO ra 

f(t)dt= / f(s)ds and hence / f(t)dt 
-a Ja J—a 


= 0 . 


9.27. The aim is to show that the map (p 1 is invertible on the half plane y> —1/2, 
where 

-l . U= 1 ±x±y 2 , 

I t=x-y. 
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a. Show that the image of the line y = a under (p 1 is the line t = s + b, where 
b = —l—a — a 2 . Show that, for any two values of a < —1/2, the image 
lines are different. 

b. Show that <p 1 is 1-1 on each line >' = a. Conclude (p 1 is 1-1 on the entire 
half plane y> —1/2. 


9.28. Show that 


IJL 


f(x,y,z)dxdydz = 



f(p cos 9 cos cp, p sin 9 cos <p, p sin^p) p 2 cos (p dpddd(p 


is the change of variables formula for triple integrals under the spherical co¬ 
ordinate map s (Exercise 5.10, p. 178). 

9.29. Determine the change of variables formula for fourfold integrals under the 
map a (Exercise 5.25, p. 183) that is the analogue in R 4 of the spherical 
coordinate change in M 3 . 

9.30. Compute both the path integral and the double integral of Green’s theorem 
for P = xy, Q = y, and R the unit square in the (x,y)-plane. 

9.31. a. Use the result of Exercise 9.18.c to show that 


j> f(x)dx + (ax 2 + bxy + cy 2 )dy = 0 

x 2 +y 2 =R 2 

for any function f(x) and any values of a, b, and c. 
b. Use the same idea to explain why 

f{x) dx + (ax 2 + bxy + cy 2 + ax + py +y)dy= cckR 2 , 

x 2 +y 2 =R 2 

when x 2 Py 2 = R 2 has positive orientation and /(x), a, b, c, a, ft, and y 
are arbitrary. 

9.32. Use Green’s theorem to evaluate each of the following path integrals. 


a. j> 5ydx + 2xdy, C: triangle with vertices (1,5), (9,2), (8,8). 

b. ^(x 2 — x?)dx+ (x 3 +y 2 )dy, C: counterclockwise unit circle. 

c. ^ye x dx+xe*’dy, C: rectangle with vertices (—1,1), (7,1), (7,5), (—1,5). 


9.33. Show that path integral ® xdv equals the area of the oriented region R. 

JdR 
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9.34. Show that the path integrals ® —ydx and 

JdR 


dR 


xdy—ydx both also equal 


the area of R. 

9.35. Let D be the elliptical “disk” x 1 /a 2 +y 2 /b 2 < 1 with positive orientation. 

a. Sketch D when a = 5, b = 3. 

b. Find parametric equations x = x(t),y = y(t) for the boundary ellipse dD. 

c. Use f xdv and the parametrizations of dD to show that area D = nab. 

J dD 

Note that if b = a then D is a circular disk with radius a = b and area na 2 . 


9.36. Suppose that H(x,y) is a harmonic function; that is, H satisfies the Laplace 
equation: 

d 2 H d 2 H 
dx 2 dy 2 

Show that (f dx — dv = 0 for any closed curve C. 

Jc dy dx 

9.37. Show that, under the maps 


x = s 2 — 3st 2 , (x = s 4 — 6s 2 t 2 + f 4 , 

y = 3s 2 t — t 3 , 1y = 4j 3 t — 4st 2 , 


the positively oriented unit disk S covers the positively oriented unit disk D 
three and four times, respectively, and the analogous integrals 


JJj q (s,t)dsdt and Jjj s (sd)dsdt 


(where J, { and J s are the Jacobians of q and s) have the values 3n and An. 

9.38. Let g : R 2 —> R 2 : (x.y) —>■ (u,v) be the quadratic map (p. 340), and let the 
arrowhead^ = g(iS') be the image of the square S: 0.2 < x < 1, 0.2 <y< 1. 

a. Sketch^ in the (w,v)-plane. 

b. Show that d(x,y)/d(u,v) = l/4V« 2 + v 2 . 

c. Determine areayf = JJ dudv— JJ4(x 2 +y 2 )dxdy = 3968/1875; cf. Ex¬ 
ercise 8.3, p. 313. 

d. Show that // “ V = // 4 dxdy = 4areaS = 2.56. 

JJa V «+ v 2 J Js 

e. Determine the moments (cf. Exercise 8.22, p. 316) of the arrowhead 
around the v- and w-axes: 



ududv , 



v dudv. 
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f f dudv ff dudv 

f. Determine // - and // —5 -=■. 

JJa v JJa u + V" 

9.39. Let D be the quarter-disk 0 < x 2 +y z < 1, 0 < x, 0 < y in the first quadrant, 
and let g be the quadratic map from the previous exercise. Determine 


area 


g(O) = // 

JJg(£>) 


dudv and also 



u 2 + v 2 dudv. 


The next four exercises are intended to explore the question: How ‘infinite’ is 1 /r at 
the origin in various dimensions? That is, although 1 /r is infinite, its integral may 
or may not be, depending on the dimension of the space in which the calculation 
is done. The following exercise asks you to explore the same question for 1 /V 2 and 
then to compare your two sets of results. 

9.40. Let B x be the interval [—1,1] on the x-axis. Let r = Vx 2 = |x|. Show that 


f 1 , 

r' 1 

H 1 , 

/ —dx = 

<N 

II 

/ —dx = °° 

Jb 1 r J 

-1 m 

/ 0 x 


(Think of B l as the “unit ball” in one dimension.) 

9.41. Let B 2 be the unit disk in the (x,y)-plane: r 2 = x 2 +y 2 < 1. (Think of B 2 as 
the “unit ball” in two dimensions.) Is 



finite or infinite? Did you make a coordinate change to calculate the integral? 

9.42. Let B 3 be the unit ball in (x,y,z) space: r 2 = x 2 +y 2 +z 2 < 1. Use an appro¬ 
priate change of variables to determine whether 


IIIJ 


dxdydz 


is finite or infinite. Make a conjecture about the integral of 1/r over the unit 
ball in R". 

9.43. Integrate 1 / r 2 over the unit ball in M' ! , n = 1,2,3. 

a. How does the finiteness of the integral of 1 /r 2 depend on nl 

b. In each dimension, how does the integral of 1 /r compare to the integral of 
1 /r 2 ? Is there some sense in which 1/r is either “more infinite” or “less 
infinite” than 1 /r 2 at the origin? 


/// l+A l + f + A dxdydz ' 

x 2 +y 2 +z 2 < 1 


9.44. Determine 
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Suggestion: Show that 


A 

T+a 


= 1 — --and then use this fact. 

1 +A’ 


9.45. a. Determine 


/// 


dxdydz , 

R 2 <x 2 +v 2 +z 2 <(R+AR ) 2 


where R > 0 is fixed and AR<^R is small. (The domain here is called a 
“thin shell”.) 

b. Show that the integral, which is the volume of the thin shell, equals 
4nR 2 AR + 0{AR 2 ). 


9.46. Determine 


III 


1 


x 2 +y 2 +z 2 
R 2 <x 2 +V 2 +z 2 <(R+AR ) 2 


dxdydz , 


AR <C R- 





Chapter 10 

Surface Integrals 


Abstract We turn now to integrals over curved surfaces in space. They are analo¬ 
gous, in several ways, to integrals over curved paths. Both arise in scientific prob¬ 
lems as ways to express the product of quantities that vary. The first surface integral 
we consider measures flux, the amount of fluid flowing through a surface. The in¬ 
tegrand of a surface integral, like a path integral, can be either a scalar or a vector 
function: flux is the integral of a vector function, whereas area—another surface 
integral—is the integral of a scalar. Also, orientation matters, at least when the inte¬ 
grand is a vector function. 


10.1 Measuring flux 

How much fluid will pass through a plane region S in space? If fluid moves with 
constant velocity v, then during a time interval At it will fill out an oblique cylinder 
with base S and generator vA t. The volume of that cylinder is the product of the 
area of its base with the height h perpendicular to that base. Now A equals the length 
of the projection of the generator on n, the unit normal to 5 in the direction of flow: 
h = vA t • n. Therefore, if we denote the area of S by A A, then the volume of fluid is 

volume = v • nAA At. 

To determine the amount of fluid—that is, its mass—we just need to factor in its 
mass density p: 

mass = pv ■ nAA At. 

The vector quantity V = p v is called the flux density (or flow density) of the fluid. 
Flux density is a rate', when p is measured in kilograms per cubic meter and velocity 
in meters per second, flux density is measured in kilograms per square meter per 
second. Its magnitude is the mass of fluid, in kilograms, that flows perpendicularly 
through a unit area in unit time. The mass of fluid that crosses the region S in unit 
time is called the total flux (or total flow) through S', it is the product 



Flux density 
and total flux 


J.J. Callahan, Advanced Calculus: A Geometric View, Undergraduate Texts in Mathematics, 
DOI 10.1007/978-1-4419-7332-0 10, © Springer Science+Business Media, LLC 2010 
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From which side does 
the fluid cross S7 


Normals and 
orientation 


Orientation determines 
the normal 


The normal determines 
the orientation 


total flux = Y • nAA kilograms per second. 


In general, we allow flux density to vary continuously from point to point, but re¬ 
quire it to be constant in time at any given point: V = Y(x,y,z). Physically, V is 
called a steady flow, mathematically, it is a continuous vector field on (a portion 
of) M 3 . We usually call Y a flow field. 

Our expression for total flux does not yet tell us from which side the fluid 
crosses S. However, if we fix one of the two unit normals n in advance—that is, 
before we consider any given fluid flow— 



V- n > 0 V- n < 0 


then total flux V • nAA becomes a signed quantity whose value is negative precisely 
when the fluid crosses S in the direction opposite n. 


Assigning a unit normal to a plane region S in space is equivalent to orienting it. 
To make the connection, we must first explain what it means to orient S in space. 
Essentially, it is the same as orienting it in the plane (p. 353): assign to each point 
p of S an ordered pair {vi (p), V 2 (p)} of linearly independent vectors that vary con¬ 
tinuously with p. The vectors are now in R 3 , of course, but we constrain them to 
be tangent to S at the point p. Following earlier practice, we let S denote S with an 
orientation. 



Next, we must make the connection between orienting S and choosing a unit 
normal for it. Suppose the ordered pair {vi(p),V 2 (p)} orients S at p. Then, as in the 
figure above, we choose 


n(p) 


vi(p) x v 2 (p) 
IMp) X v 2 (p)| 


to be the unit normal to S at p. On any pathwise-connected component of S, both 
n(p) and the orientation of S are constant (Theorem 9.12, p. 354). 

If we think of orientation as defining a “sense of rotation” on S (cf. p. 353), then, 
from the side of S on which n lies, that rotation is counterclockwise. This assumes 
that the coordinate frame in R 3 is right-handed, for then the sense of rotation in the 
(x,y)-plane, as viewed from the positive z-axis, is counterclockwise. 

It is equally straightforward to connect the choice of a unit normal to the choice 
of an orientation. Once the unit normal n for S is given, choose any two linearly 
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independent vectors Vi and V2 that are perpendicular to n and such that 
{vi,V2,n} or, equivalently {n,vi,V 2 }, 

is a positively oriented triple of vectors in R 3 . Then vi and V2 are everywhere tangent 
to S, so we can orient S by assigning v,-(p) = v,-, i = 1,2, at every point p in S where 
n is the orienting normal. 

The figure above also indicates that the orientation of S induces an orientation Induced orientation 

of dS, just as in R 2 . When we view S from the side toward which the orienting on the boundary 

normal n points, then S lies on the left as dS is traversed in the positive direction. 

Definition 10.1 Let a fluid have constant flux density V, and let S be a plane region 
in space that has finite area AA and orientation given by the unit normal n. The total 
flux of the fluid through S in unit time is 

<D = Y-nA A. 

This formula has important special cases. Let Y = (X,Y,Z), and suppose S = S x Regions parallel to the 
lies in the plane x = a, has area AA = AA x , and is oriented by the positive x-axis: coordinate planes 

n = (1,0,0). Then total flux through S x is 

O = ® A = (X. Y,Z) ■ (I ■ 0.0) A A x = XAA x . 



Here total flux depends only on the components of the flow field V that is perpen¬ 
dicular to the plane x = a; the other two components, Y and Z, in directions parallel 
to that plane, have no effect, in other words, Y and V* = (S,0,0) have the same 
total flux through S x . For a region S y in the plane y = [5 or S : in z = y, we find, 
respectively, 

®y = YAAy, O z = ZAA z . 

The figure above also shows how a region parallel to each coordinate plane is 
oriented when the remaining positive axis is used as the defining normal (and the 
three axes together have their usual right-hand orientation): 
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Total flux through 
a parallelogram 


Components of 
total flux 


Plane 

Normal 

Order of Axes 

(y> z ) 

x-axis 

J-S-Z 

(x,z) 

^-axis 

z —> X 

(y, z ) 

z-axis 

y^z 


In two cases, the plane is oriented by its axes in alphabetical order, but in the third, 
by the opposite order. As the figure in the margin shows, the correct order is the 
cyclic one > ••• that the three coordinate axes have when 

they are viewed from the positive orthant (i.e., from the region where x >0,y>0, 
andz > 0 ). 

If S is the oriented parallelogram p A q, then its orienting unit normal is 

= px q 

Up x qll 

(if p x q ^ 0) and its area is AA = ||p x q||. Therefore, n AA = p x q, and total flux 
through p A q takes the simple form 


<p = Vpxq = pxq¥, 


the scalar triple product of p, q, and V (cf. p. 43). To compute n, we need pxq/0; 
however, if p x q = 0, then AA = 0 and O = 0, so the formula O = V ■ p x q is still 
valid. If 

V=(X,Y,Z), p = (pi,p 2 ,P3), q = (91,92,93), 

then (e.g., from the proof of Theorem 2.11, p. 43), we have 


pxq = 


f P2 P3 
V 92 93 


P 3 P\ 
93 91 


Pi P2 \ 
9i 92 ) 


<D=A 


P2 P 3 
92 93 


+ Y 


P3 P1 
93 91 


+ Z 


P i 
91 


P2 

92 


Suppose we project S onto each of the coordinate planes x 0, v = 0, and z = 0; 
the images are parallelograms S x , S y , and S z , respectively, whose areas are the 2 x 2 
determinants that appear as the components of the vector pxq (see p. 44): 


A A x = 


P2 P3 
92 93 


A Ay = 


P3 P1 
93 91 


A A z = 


Pi P2 
9i 92 


Each of these is the signed area of an oriented parallelogram whose orientation is 
determined by the coordinate plane in which it lies. From the discussion above, we 
know that the value of the total flux through each of S x , S y , and S z can be written as 


= XAA x , O, = YAA y , <D Z = ZAA z . 


These are the components of total flux through 5: O = O t + O, + O-. 
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An example helps clarify these ideas. To simplify the picture as much as possible, Example: flow through 
we work with triangles instead of parallelograms. Let T be the triangle spanned by a tetrahedron 

a pair of vectors p and q and oriented by p x q. In the figure below. 


P = (—6,0, —2), q = (—6,4,0), and ¥ = (-1,1,3); 

p and q are placed so that each edge of 7 lies in one of the coordinate planes. 
Consequently, T and its projections T x , T y , and T z form a tetrahedron. We have 



The triangle T has half the area of the parallelogram p A q; therefore total flux 
through T is 

<f> = \ ¥-pxq = ±(-8+ 12-72) = -34. 

Notice that, in the figure, the boundary of T has clockwise, or negative, orientation 
as we view it from the side on which ¥ lies. This confirms that <J> must be negative. 
Furthermore, ||p x q|| = 28, so 

n = 2,3,—6) and AA = area T = ||p x q||/2 = 14. 

We can read off the signed areas of T x , T y , and 7- as one-half of the corresponding 
component of p x q: 


AA X = 4, A A y = 6, AA Z = —12. 

The signs here confirm our direct observations: the boundaries of T x and T v have 
positive (counterclockwise) orientation with respect to the positive x- and y-axes, 
but T z has negative (clockwise) orientation with respect to the positive z-axis. The 
total flux through each of these faces is 

® v = — 1 x AA X = —4, ® v = +1 x AA V = +6, O z = +3 x AA Z = — 36. 

Of the three faces, total flux is positive only through T v , because only on that face 
does the component of ¥ (shown in outline in the figure, lying inside the tetra¬ 
hedron) point in the same direction as the orienting normal. Finally, because flux 
density ¥ is constant, no fluid accumulates in the tetrahedron: the fluid that flows 
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through the one face T must equal the total that flows through the other three that 
have the same boundary as T: 


® — Ox + <f> v + 


Flux through a Now suppose that the oriented surface S is curved rather than flat. To be definite, 

curved surface let s be a parametrized surface patch. Thus we begin with a continuously differen¬ 

tiable 1-1 immersion f: Q. —y R 3 , where Q. is an open set in R 2 . The condition that 
f be an immersion means (Definition 6.8, p. 212) that the derivative df( a ,b) is itself 
1-1 (or, in this case, has maximal rank 2) for every (a, b) in £2. This guarantees that 
the image of f is fully 2-dimensional everywhere; see the discussion on page 128. 

Let U be a closed, bounded, and positively oriented subset of Q that has area. 
Because f is a 1-1 immersion, the orientation on U will induce an orientation on its 
image f(£/), exactly as on page 355. 



4f D (e 2 ) 



dfpOh) 


S = f(C7) 


Induced orientation 


Oriented surface patch 


Theorem 10.1. If the vectors {vi(p),v 2 (p)} determine the orientation ofLJ, then 
their images 

{df p (vi (p)),df p (v 2 (p))} 

determine an orientation ofi(U) that is called the induced orientation. 

Proof. First of all, the image vectors are tangent to f(t/) at f(p). Second, they vary 
continuously with the point f(p) because v,-(p) vary continuously with p, and f is 
continuously differentiable. Third, they are linearly independent because f is an im¬ 
mersion at p. □ 

Definition 10.2 We say S is an oriented surface patch ifS = f (LI), where the map 
f: Q. —» R 3 is a continuously differentiable 1—1 immersion on an open set Q. in R 2 , 
U CO is a closed, bounded, positively oriented set with area, and S has the induced 
orientation. 


By the discussion on pages 388-389, we can always replace an ordered pair 
{df( a t 6)(vi),df( a t 6 )(v 2 )} of tangent vectors by the normal 


df O,6)( V l) x df(a,6) ( v 2 )> 
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to orient S at the point ( a,b ). Let us orient U in R 2 using just the standard basis 
vectors, by assigning the pair {ej, e 2 } to every point p (as in the figure above). Then 
the vectors df (fl i )(ei) and df (fl 6 )(e 2 ) that orient S are the columns of the matrix 
df ( „ /,), in that order. Hence, if 


[ x = x(n,v), 

/x u {a,b ) x v (a,b)\ 

ly=y{u,v), 

df ( a ,6) = \yJ,a,b) y v (a,b) 

[z = z(u,v), 

\z„(a,b) z v (a,b)J 


parametrizes the oriented surface patch S, then the cross-product of the column vec¬ 
tors of df(„ determines the orienting normal for S at f (a, b ): 


N f (a,b) 


/ y u (a,b) y v (a,b) 
\yz u (a,b) z v (a,b) 


z u (a,b) z v (a,b) 
x u (a,b) x v (a,b) 


Xnia^b) x v (a,b) 
y u (a,b) y v (a,b ) 


f d(y,z) d{z,x ) d(x,>Q \ 
\d{u,vY d{u,vY d{u,v)J {ab) ' 


Because f is an immersion everywhere on Q, the columns of df ia /) j are linearly 
independent and, therefore, Nf(a, b) ^ 0. 

From a parametrization f of S we can always construct a parametrization f of the 
oppositely oriented patch —S by reversing the order of the parameters. Specifically, 
let Z, : R 2 — > R 2 : (s,t) —> (u,v) be the reflection 



and let Q* = L~ l (Q) and U* = L~ X (U) as sets. Let U and U* both be positively 
oriented; then, because L reverses orientation, L(U*) = —U. The final step is to let 
L = fo L; then L is defined on Q.* and 


f (U*) = fo L(U*)= f(-U) = -S. 


Now let S be an oriented surface patch in (x,y,z) -space, and suppose a fluid with 
continuously varying flux density V(x,y, z) flows through S. Our goal is to determine 
the total flux of Y through S. If we first approximate ,8 by a collection of oriented 
parallelograms, then total flux through those parallelograms gives us an estimate of 
the total flux through S. To get the parallelograms, partition the parameter domain 
U with one of the grids J/ /f that are used to define Jordan content in the plane (cf. 
p. 281), and let Q be the square cell of j k whose lower-left corner is at the point 
(a, b), positively oriented as a part of U. The image of Q under the linear map df (l ,/, :i 
is an oriented parallelogram P in R 3 that is tangent to S at f(a,b) and has one corner 
there. (If k is large enough, every Q that meets U will lie entirely within £2, so df (l( /, :i 
will be defined.) See the next figure. 


Orienting normal 
of the parametrization 


Parametrizing —S 


Estimating total flux 
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Approximating S 
by parallelograms 


Total flux through a 
single parallelogram 



As Q ranges over all the cells of J k that meet U, the image parallelograms make 
up a collection of plates attached to S at their points of tangency, as in the figure 
above. The plates resemble the scales that cover the skin of a reptile or armadillo. 
The figure shows the surface patch S first by itself, then with the parallelograms from 
J i attached, and finally with the parallelograms from ‘J 2 attached. We see each set 
of parallelograms from two different viewpoints. The figures suggest that the plates 
give us a rough approximation to the surface, an approximation that improves as the 
plates become smaller and more numerous, that is, as k increases. 

To estimate the total flux through a single parallelogram P = df( u /,)((5), note that 
the edges of Q are multiples of the basis vectors ei and in M 2 . If we write those 
edges as 

A«ei and Ave 2 , 

where Am = Av = 1 / 2 A > 0 when Q is a cell in J k , then we can then write the edges 
ofP as 


( x u (a 1 b)\ (x v (a 1 b)\ 

y u (a,b) , q = Avdf (fljZ) )(e 2 ) =Av \y v (a,b) 

z u (a,b)J \z v (a,b)J 
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Therefore, 

where 


p x q = Nf(a,b)AuAv = n AA, 


Nt(a,b) 


f d{y,z) d(z,x) 

\d(u,v)' d{u,v)’ d{u,v)J {afi] 


is the orienting normal for S at f (a,b) (p. 393). By the geometric definition of the 
cross product p x q, n is the unit normal in the same direction as Nf(a, b) and AA is 
the ordinary area of the unoriented parallelogram P. 

It remains for us to apply the formula O = Y • p x q on P, but this requires the 
flow field Y to be constant. We can get a constant by replacing 


Y(x,y,z) = (X(x,y,z),Y(x,y,z),Z(x,y,z)) 


everywhere on P by the single value V(f (a,b)) that V takes on at the corner where 
P is attached to S. By hypothesis, Y(x,y,z) is continuous, so the error caused by this 
replacement can be made as small as we wish by taking Q sufficiently small. We 
now find 




V d(y,z) d{z,x) 

d(u,v) d(u,v) 


d(*,y) 

3(u,v) 


j AuAv. 

J(a,b) 


The right-hand side is a constant determined by (a,b): the three Jacobians are 
evaluated at (u,v) = ( a,b ), and X, Y, and Z are evaluated at the point (x,y,z) = 
(x(a,b),y(a,b),z(a,b)). 

We can now estimate total flux through the oriented surface patch S by adding 
up the contributions from all the cells that meet the domain U. Let the 

lower-left corner of Qj be at the point («/, v,-); then 


Pi V 3{u,v) o(u,v) o(u,v) 


AuAv. 


This is a Riemann sum for the oriented integral 



d(u,v) 


. 7 d(x,y) 
d{u,v) 


) dudv. 

(m,v) 


Because the integrand is continuous, the integral exists and the Riemann sums con¬ 
verge to it as k 00 (Theorem 8.35, p. 305). 

Definition 10.3 Suppose (x.y.z) = f(u. v'j parametrizes the oriented surface patch 
S = f(C7), and Y = (X, Y,Z) is a continuous flow field defined on an open set con¬ 
taining S, then the total flux ofV through S for the given parametrization is the 
oriented integral 



Total flux for a given 
parametrization of S 
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Example 1: radial flow 
out of a sphere 



The notation suggests that the value of Of depends on the parametrization f. How¬ 
ever, if total flux through a surface is to be physically meaningful, its value should 
be independent of the parametrization of that surface. Before we show that it is, we 
calculate O for two examples. 

Let us determine the total flux of V = (X,Y,Z) = (Cx,Cy,Cz) (where C is a 
constant) through the unit sphere S , parametrized as 

{ x = cos 0 cos <p, 
y = sin 0 cos<p, 
z = sharp; 

(Strictly, speaking, a surface patch can cover only a portion of the sphere; f is 
not 1-1. However, no essential error is introduced by using this parametrization; 
see pages 417-419. It is simpler to compute flux through the whole sphere.) The 
flow field V is radial; each vector points away from the origin with a magnitude 
proportional to its distance from the origin. Thus, although V varies, it is every¬ 
where normal to the sphere and has constant magnitude ||V|| = C there. It follows 
directly—without calculating the integral—that 

O = ||V|| x areaS = 47 rC. 


j - —n < 0 <n, 

-n/2 < (p < nil. 


Let us compare this with the value provided by the integral. Because 


d(y,z) 

d(9,(p) 

d(z,x) 

d(6,<p) 


cos 0 cos (p — sin 0 sin (p 
0 cos <p 


= cos 0 cos" <p, 


0 cos (p 

— sin0cos<p — cos0sin<p 


= sin 0 cos" p, 


d(x,y) 

d(d,(p) 


— sin 0 cos <p 
cos 9 cos (p 


— cos 0 sin <p 

— sin 0 sin <p 


= sin(pcos<p, 


the integrand is 

Ceos 0 cos (p ■ cos 0 cos 2 <p + Csin 0 cos <p • sin 0 cos 2 <p + Csin<p • sin<pcos<p 
= C( cos 2 0 cos 3 (p + sin 2 0 cos 3 <p + sin 2 (p cos cp) 

= C(cos 2 (p + sin 2 ^p)cos(p = Ccos<p, 


and therefore the integral equals 


fit / 

fK/2 \ rl 

/ c 

/ cos <pd(p )d6 = 

j— iz \ 

J—n/2 J J 


2 Cd9 = 4 nC. 


Note that the orientation normal given by f at the point (x,y,z) is 
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/ djv,z) d(z,x) d(x,y) \ 

\d(8,<pyd(e,<pyd(e,<p)J 


cos (p • ( x,y,z). 


This is, of course, a multiple of the radius vector (x,y,z). Moreover, it is a positive 
multiple (at least when — n/2 < <p < n/2), so Nf is an outward normal on the sphere. 

For a second example, consider a constant flow V = (A,B,C) through the same 
sphere with the same parametrization. Because the flow is constant, all the fluid that 
enters on one side of the sphere exits on the other. That is, inflow equals outflow, so 
we expect the net flux through the whole sphere to be zero: 0 = 0. The integral is 



cos 6 cos 2 (p + B sin 0 cos 2 cp + C sin <p cos (p ) dtp dO , 


and can be dealt with one term at a time. The first is 

/ K t-n/l 

cos 9dG cos 2 (pdtp =A x 0 x n/2 = 0. 

-K J-k/2 

For similar reasons, the second and third terms also equal zero, so O = 0. 

Our calculation of O for a given surface is tied to a parametrization of that sur¬ 
face. If we change the parametrization, will O change as well? Consider what hap¬ 
pens when we revisit Example 1 with a different parametrization for the sphere. Let 
g : R 2 —> R 3 be given by 


2 u 

X= 1 T M 2 + V 2 ’ 
2v 


1 — u 2 — V 2 
. 1 +ll 2 + V 2 

Because |jg(«, v)|| 2 = 1 for every ( u,v ) in R 2 , the image of g is some part of the unit 
sphere. In fact, g(R 2 ) covers the entire sphere except for the south pole (x.y.z) = 
(0,0,-1) (see Exercise 10.6). After some calculations (and setting D = 1 + w 2 + v 2 
to simplify the expressions), we find 

d(v,z) 8 u d{z,x) 8v d{x,y) 4(1— w 2 — v 2 ) 

d(u,v) Z) 3 ’ d(u,v) Z) 3 ’ d(u,v) D 3 

This means that the orienting normal for g at the point (x,y,z) is 


j = ( d(y,z) d(z,x) d{x,y) \ 
g \i9(m,v) ’ d(u,v) ’ d(u,v) J 



Because 4/D 2 > 0, N g is a positive multiple of the radius vector (x,y,z) and is thus, 
like Nf, an outward normal. That is, g and f induce the same orientation of the sphere. 


Example 2: constant 
flow through a sphere 



Does ® depend on the 
parametrization of 5? 
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Let us now calculate ® using g instead of f. The integrand is 

16Cm 2 +16Cv 2 + 4C(1-m 2 -v 2 ) 2 _ 4C(1 + u 2 + v 2 ) 2 4C 

D* ~ D 5 ~ D 2 ’ 


Preview: 
invariance of <J> 


Comparing 

parametrizations 


so the integral itself is 

ff 4Cdudv f 2n r 4Cp dp „ -2C ” „ „ 

JJm. 2 (1 + (m 2 + v 2 ) 2 ) Jo Jo (1+P 2 ) 2 1+P 2 0 

under a change to polar coordinates. We find that the two parametrizations of S give 
the same value for d>. 

What we have just seen is true in general: total flux is independent of the para- 
metrization, at least for parametrizations that induce the same orientation. To prove 
this, we first show that an orientation-preserving coordinate change in the source 
gives a new parametrization that induces the same orientation on S and yields the 
same value for total flux. Then we show that any two parametrizations that induce 
the same orientation on S are related by an orientation-preserving coordinate change 
in their sources (and thus give the same total flux). 

In the following theorems, f: Cl —> R 3 and g : Q.* —> R 3 both parametrize the 
oriented surface patch .S; they have coordinate functions 


! x =x(u,v), 
y = y(u,v), 
z = z(m,v), 


ix=E,{s,t) 
Jz=C0,f), 


and orienting normals 


Nt{u,v) 


f d(y,z) d(z,x) ^(jc,y) \ 
\d (m, v) ’d (u, v) 1 d (u, v) / ’ 


N g (s,t) 


( d(H,C) d(C,g) d(4,V) \ 

{ d(s,t ) ’ d(s,t) 1 d(s,t) )■ 


Thus f and g are 1-1 immersions on their domains, f(Q) = g(Q*), and S = 
f (U) = g (£/*), where U and U* both have area and are closed, bounded, and 
positively oriented subsets of il and Q.*, respectively. For the flow field V = 

(X(x,y,z),Y(x,y,z),Z(x,y,z)), we define 


Of = 


® g = 


JJ 0 [X( f(«,v)) 

i 


d(y,z) . v(r( ^d(z,x) d(x,y)\ 

a(w,v) d(u,v) o{u,v) J 


dudv , 


+ ms ’0) ff+«*(•.*)) ff) 


dsdt. 


In the first theorem, g is constructed from f by a coordinate change, that is, by a map 
(p : Q* —> Q.: (s,t) —> (w, v) that is continuously differentiable on an open set Cl* in 
R 2 and has a continuously differentiable inverse on Cl. 
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Theorem 10.2. Suppose f : Q. —*■ M 3 : (m,v) —> (x,y,z) parametrizes the surface 
patch S = f(U), and let (p : Q* —> Q. : ( s,t ) —> (u. v) a coordinate change. Set 
g(s,t) = f(q>(s,i)) on £2*. If the Jacobian of (p. d(u,v)/d(s,t), is everywhere posi¬ 
tive on Q.*, then U* = (p 1 (£/) is positively oriented, g parametrizes S as an oriented 
surface patch, and O g = Of. 



Proof Because (p is a coordinate change, g is a 1-1 immersion on Q* and U* = 
<p 1 (U) has area. Furthermore, (p preserves orientation because its Jacobian is pos¬ 
itive; hence U* is positively oriented (Theorem 9.13, p. 355). Because g (U*) = 
f(U) = S, g parametrizes S as an oriented surface patch. 

It remains for us to prove that O g = Of. For this, it is helpful to write (x.y.z) = 
g{s,t) = f (<p(s,t)) in terms of coordinates: 

! x = %{s,t) = x(u(s,t),v(s,t)), 
y =n(s,t) =y(u(s,t),v(s,t)), 
z = C( s ,t) =z(u(s,t),v(s,t)). 

The chain rule then implies the following about the various Jacobians: 

d(ri,Q d{y,z) d{u,v) d{U) d(z,x) d(u,v) 

d(s,t) d(u,v) d(s,t) ’ d(s,t) d(u,v) d(s,t) ’ 

d(Z,ri) _ d{x,y) d{u,v) 
d(s,t) d(u,v) d(s,t) 

We now show that the first terms of O g and Of are equal: 

//_ X(%(s,t)) dsdt = [[x(f (u,v)) d Jf ,Z \ dudv. 

JJu* d(s,t) JJu d[u,v) 

Equality of the other two pairs of terms can be established the same way. We begin 
with the substitutions 


Parametrizations from 
coordinate changes 
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Coordinate changes 
from parametrizations 


JJ 0t X(g{s,t)) dsdt = JJ^X(f(<p(s,t))) 


d(y,z) d(u,v) 
d(u,v) d(s,t ) 


dsdt. 


The oriented change of variables formula (Theorem 9.14, p. 357) then implies 

d(y,z) d(u,v) 


IL 

JJu* 0{U 


d(u,v) d(s,t) 


dsdt = 


IS. 


X(f(u,v)) dudv. 

<P(U*) o(u,V) 


Because <p(U*) = U, the proof is complete, by what we said above. □ 

Theorem 10.3. Suppose f : Q —> R 3 and g : Q.* —> R 3 both parametrize the ori¬ 
ented surface patch S. Then there is an orientation-preserving coordinate change 
(p : Q* —> il for which g (s,t) = f (<p(s,t)) for all (,s,t) in Q.*. 

Proof. The map f is 1-1 everywhere on Q., so its inverse f 1 is defined on f(Q) = 
g(Q*). Consequently, we can define 

<P(sf) =r 1 (g(5,f)) 

for every (s,t) in Q.*. Although (p is 1-1 because f 1 and g are, it is not obvious that 
it is also differentable. The chain rule, 


d <P(vr) = df g( l s . 0 °dg (s ,o 

fails here, because the needed derivative of l ^ 1 is not available. To see why, recall 
that derivatives are linearizations. Because g maps an open subset of R 2 to R 3 , 
its linearization dg^ q at any point (s,t) is a map from R 2 to R 3 . For the chain 
rule to work, the linearization df,^ ; f would have to map R 3 back to R 2 , and that 

would require f _1 itself to be defined on an open subset of R 3 . Unfortunately, f " 1 is 
undefined off f(Q): df ^ does not exist. 
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But differentiability is a local condition; let us give <p a new local formulation 
that makes its differentiability evident. Fix a point (so,to) in Q.*, and let (uo,vq) = 
<p( s o,to) and (^ 0 ;FO: z o) = g(so^o) = f(wo, vo). Because f is an immersion at (uq,vq). 
Theorem 6.20 (p. 212) provides a coordinate change h : TV 3 —> R 3 defined on a 
neighborhood of (xo,yo,zo) that makes hof an injection. That is, for every (u,v) 
near (u 0 ,v 0 ), 

hof(«,v) = (u,v, 0 ). 

If 77 : R 3 —► K 2 is the linear projection map that discards the third coordinate (i.e., 

n(x,y,z) = (x,yj), then 

77ohof(M,v) = (u,v). 

In other words, 77 o h: TV 3 —> R 2 plays the role of the inverse of f on S near (xo,yo, z o)> 
but with the advantage that it is defined on a full 3-dimensional neighborhood of 
(xo,yo, z o). Therefore, we set 


<p(s,t) =Tlohog(s,t) 
for all (s,t) near (so, to), and then have 


d<P(s 0 ,fo) H ° ^(xo.t'o.zo) ° ^S(s 0 4o) ■ 

Because h and g are continuously differentiable and (so, to) was an arbitrary point 
in Q*, (p is continuously differentiable on il*. A similar argument shows that <p 1 is 
continuously differentiable. Because U* and U — (p(U* ) are both positively oriented 
(by definition of S), (p preserves orientation (Theorem 9.13, p. 355). □ 

Corollary 10.4 Suppose f: Q —> K 3 and g : il' JR :5 both parametrize the oriented 
surface patch S. Then Of = O g ; total flux of a fluid through S is independent of the 
parametrization. □ 


Although the formula for O gives the same value no matter which parametriza¬ 
tion is used to compute it, that formula is nevertheless bound to a parametrization. 
The invariance of O would be reflected better by an intrinsic formula, one not bound 
to a parametrization. The existing expression. 



+ Y(f(u,v)) 


d(z,x) 

d(u,v) 


+ Z(f(u,v)) 


d(u,v)J 


dudv, 


is a double integral over a portion of the (u,v) parameter plane; an intrinsic formula 
would eliminate those parameters. The three Jacobians that appear here are the sort 
that would be “transformed away” when we change variables in an oriented double 
integral (e.g., using Theorem 9.14, p. 357). For example, 

dudv would be replaced by dydz. 

o(u,v) 

If we make that replacement here, and similarly replace 


Invariance of O 
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Surface integrals 


f ^ dudv by dzdx , 
d{u,v) 


^ dudv by dxdy , 
o{u,v) 


f (m, v) by (x, y,z), and (7 by 5, then no trace of the original parameters remains, and 
we are left with 


<f> = 



,y,z) dydz + Y (x,y,z) dzdx + Z(x,y,z) dxdy. 


This is a new kind of object called a surface integral. It provides the intrinsic 
formula we seek, expressing O solely in terms of the oriented surface patch S and 
the flow field Y (by its component functions X, Y, and Z). 


Definition 10.4 Suppose the vector field V(x,y,z) = (X, Y. Z ) is defined and con¬ 
tinuous on the oriented surface patch S; the surface integral of V over S is the 
expression 


JJ^X(x,y,z) dydz + Y(x,y,z) dzdx + Z(x,y,z) dxdy 


whose value is given by the double integral 



y(f(«,v)) 


d{z,x) 

d(u,v) 


+ Z(f(u,v)) 


d(x,y) \ 

d(u,v)J 


dudv , 


where f: Q —> R 3 is any parametrization ofS= f(£/). 

In effect, the parametrization pulls back the surface integral from S in K 3 to a double 
integral on U in R 2 . Corollary 10.4 implies that the value of the surface integral is 
independent of the paramatrization of S. 

Theorem 10.5. When the orientation of S is reversed, the surface integral changes 
sign: 


JJ Xdydz + Ydzdx + Zdxdy= — JJxdydz+Ydzdx+ Zdxdy. 


Proof. Suppose f: Q —> M 3 parametrizes the oriented surface patch —S; then, by 
definition, —S = i(U) for some positively oriented region U C Q. Let L : R 2 —> R 2 
be the orientation-reversing linear map (reflection) 


(; u,v)=L(s,t ) = ( t,s ), 

and let Q* = L Because L 1 reverses orientation, the induced orientation 

(p. 355) on the image U* = (-U) is positive. Define g = f oL; then g : Q* —>■ R 3 

parametrizes 

g (U*) = mu*)) = f {-0) = -f (U) = s 

itself, because U* is positively oriented. 

The expressions involved in proving the two surface integrals equal are long 
and complicated. To simplify our work, we deal only with the first terms of the 
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integrals; the second and third terms can be dealt with the same way. First use the 
parametrization f to get 

[[ Xdydz= [[ X(f(u,v)) \ dudv. 

JJ-S JJu d(u,v) 

Now use the change of variables (u, v) =L(s,t), f (m,v) = f(T(s,t)) = g (s,t) and the 
oriented change of variables formula (Theorem 9.14, p. 357) to write 


l! mu ’ v)) W) d “ d " - // x(f(i(s - ,,)) W)W) dsd ' 


d(y,z) d(u,v) 


L~ l {U) 

d(y,z) 


= JLo■ x{s(sd)) 501 dsJ ' = " //, x(gM) 501 **- 

The last integral is a parametric representation of the surface integral 

— JJxdydz; 

by what was said above, this proves the theorem. □ 

Note the similarity in form between a surface integral and a path integral (for a 
path that lies in space): 

JJxdydz + Ydzdx + Zdxdy versus Jpdx + Qdy + Rdz. 

In these integrals, the physical vectors, flux density V = ( X , Y, Z) and force F = 
( P ; Q. R), are represented by their components. Flowever, for the path integral there 
is an alternate form in which F itself appears: 

work = J Pdx + Qdy + Rdz= J F■ t ds. 

Flere t is the unit tangent that orients the path C (see p. 19), and ds is the “element 
of arc length” for the unoriented path C. This alternate form is an unoriented path 
integral; information about the orientation of C has been transferred to the integrand, 
to the factor t. 

The surface integral also has an alternate form that is analogous to the second 
path integral. We can derive that new form by reconstructing our estimates for total 
flux through S. In the original construction, we began with a collection of oriented 
parallelograms P\,...,Pi that approximated S; O(/0 was the total flux through Pi, 
and the sum 

i>$) 

(=i 


estimated total flux through S itself. If Pj = p, A q, and V, was the value of Y at the 
corner (■ Uj,Vi ) of Pi, then (p. 395) 


The form of path and 
surface integrals 


An alternate form for 
a surface integral 
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Rewriting the 
cross-product 


Area and 
scalar integrals 


<5(4) ~ V,- • p, x q, 


'y d(y, Z ) , y d(z,x) 
d(u,v) d(u,v) 


7 d(x,y) 

d(u,v) 


) AuAv. 


The expressions on the right are the terms in a convergent Riemann sum; in the limit 
they give the surface integral 


O = 


dydz + Ydzdx + Z dx dy. 


To construct the second, alternate, form of the surface integral, note that our 
estimate for 0(4) used the component form of the cross-product: p, x q, . We now 
switch to the geometric form, 


p, x q, = n, AAj, 

in which AA ,• > 0 is the absolute area of 4 and nj is its orienting unit normal. In 
terms of these geometric variables, 

/ / 

XO(4) = XV r n iA4, 

i=l /—1 


This is the Riemann sum; if we follow the usual pattern in expressing its limit as an 
integral, we get 


i dA. 


O = lim V V,- ■ n, AAj = [[ V • n < 
p! JJs 

A/1,—>0 

This is the alternate form for a surface integral; that is, 

Jjxdydz+Ydzdx + Zdxdy = JjY-ndA. 


The integrand V • n is the normal component of flux density on S; the domain 
of integration is the unoriented surface patch S. We call dA the element of sur¬ 
face area for S. Information about the orientation of S has been transferred from 
the domain of integration to the integrand. Compare the new integral to the origi¬ 
nal expession V • n AA for total flux through a plane region (cf. pp. 387-389 and 
Definition 10.1). 

In the next section, we discuss integrals of the general form 

JJ s f (x)dA, 

where / is a scalar function defined on a region in space that contains the surface 
patch S. In particular, /(x) = 1 leads to a notion of area for S and indicates why we 
think of dA as the element of surface area for S. 
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In this section, we define the area of a curved surface in space as an integral. Using 
that as a basis, we then define the integral of a scalar function over a curved sur¬ 
face. Surface area is analogous to arc length, and scalar integrals over surfaces are 
similarly analogous to scalar integrals over curved paths. 

To define the area of an unoriented surface patch S, we begin with a parametri- 
zation f: Q —> R 3 of S. Thus f is a continuously differentiable 1-1 immersion on 
an open set £2, U C £2 is a closed, bounded, unoriented set with area, and S = f( U). 
Using f and its derivative, we can approximate S by a collection of parallelograms 
whose total area will give us an estimate for the area of S. This is essentially the 
procedure described beginning on page 393. Pick one of the grids J k used to define 
Jordan content in the plane, and select the square cells Q \,..., Qi of J k that meet U. 
Let (uj, Vi) be the lower-left comer of Q,, and let 

P i = df (u i ,v i )(Qi)- 

This is a parallelogram tangent to S at f v,); the parallelograms together 

give us an approximation of S that improves as k —> The edges of Q t are multiples 
of the standard basis vectors that we can write as 


Awei and Ave 2 , 

where Au = Av = 1 /2*. The corresponding edges of Pj are 


Pi = AMdf ( „. jV .)(ei), q,- = Avdf ( „. iV .)(e 2 ); 


they are multiples of the columns of df(„. v .). If 


\x = x(u,v), 

fx u {a,b) x v (a,b)\ 

ly=y{u,v), 

df ( a ,b) = \yu{a,b) y v {a,b) 

[z = z(u,v), 

\z u {a,b) z v (a,b)J 


then (p. 393) 


Pi x q, = 


and the area of P, is 


{ d(y,z) d(z,x) d(x,y) 
\d(u,v) ’ d(u,v) ’ <9(zqv) 


) AuAv, 

' («i,Vj) 


Up- X q/1| 


1 r d(y,z)' 

2 

-L 

d(z,x) 

2 

4- 

'd(x, y y 

2 

[<9(z (,v) 


d(u,v) 


d(u,v) 



AuAv. 

(i«i,Vi) 


Approximating 
a surface patch 



Area of P, 


This is the ordinary nonnegative area of an unoriented region. The total area of the 
parallelograms that approximate S is therefore 
















406 


10 Surface Integrals 


Area for a given 
parametrization 


i 

I 


[ 

' d{y,z) 

2 

d(z,x) 

2 

'd(x,yf 

2 

/ 

d(u,v) 


d(u,v) 


d(u,v) 



AuAv. 


This is a Riemann sum for the double integral 


areaf(S) = 




' d(x,y) ' 2 
d(u,v) 


dudv. 


Because f has continuous first derivatives, the integrand is continuous and the Rie¬ 
mann sums converge to the integral as k —> «>, If we consider 


Local area multiplier 


M(U) = areaf(iS') = JJ 


diy,- 


+ 


d(z,x) 


u V [<3(w,v)J L <9 (m,v) J [ d(u,v) 


d{x,y) 


dudv 


to be a set function defined on closed bounded subsets U C Q that have area (cf. 
pp. 310-312), then its derivative is 


M'(u,v) = 


f\ d(y,z)' 

2 

d(z,x) 

2 

u 

'd(x,y)' 

\_d(u,v) 


d(u,v) 


d(u,v) 


(Theorem 8.39, p. 312). This implies that 


areaf(S) 


i\ d(y,z)' 

2 

d(z,x) 

2 

+ 

'd(x,y)' 



d(u,v) 

d(u,v) 


l2 


as closely as we wish by making the diameter of U sufficiently small. It is for this 
reason that we defined (Definition 4.9, p. 139) 


/ 


' d(y,z) ' 2 
d(u,v) 


+ 


d(z,x) 2 
d(u,v) 


+ 


' d(x,y) ' 2 
d(u,v) 


to be the local area multiplier for f. 

Does the value of the integral for surface area depend on the parametrization? 
Let g : Q.* —>• R 3 be another parametrization of S, where 


(pc,y,z) = g(s,t) = (^(s,t),Tj(s,t)X(s,t)). 


There is a closed bounded set U* C Q.* with area, for which g((/*) = 5 and 


ar “ 8<s)= //„. v. 


3(n,C) 


d(s,t) 


9(C,g) l 2 | [ ^(4,17) 

d(s,t) d(s,t) 


dsdt. 


Invariance of 
surface area 


Theorem 10.6. Surface area is independent of the parametrization used to compute 
it: areag(5) = areaf(S). 
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Proof. If we set orientation aside, the proof of Theorem 10.3 provides a coordinate 
change <p : £2* Q, (i/,v) = <p(s,t), for which g(t/*) = U and g (s,t) = f(<p(s,t)) 
for all (s,t) in £2*. The chain rule implies 

d{ 0,0 ^ d(y,z) d(u,v) d(£,4) ^ d(z,x) d(u,v) d(£,ij) = d(x,y) d(u,v) 
d(s,t) d(u,v) d(s,t) ’ d(s,t ) d(u,v) d(s,t) ' d(s,t) d(u,v) d(s,t) 

Therefore, on £2* we have 


l\d( 77,C)1 

2 

\d{U)] 

2 

+ 




d(s,t) _ 

d(s,t) _ 


j\ d(y> z )' 

2 

d(z,x) 

2 

+ 

'd{x,y)' 

2 

d(u,v) 

[r)(n,v) 


d(u, v) 

d(u,v) 


d(s,t) 


Now make this substitution in the formula for area g (,S'), and then use the basic 
change of variables formula (Theorem 9.11, p. 350) to get 


" ea « (s)= /X. 

-JL 

-SL 


d(s,t) 

-| 2 

+ 

1-1 

■] 2+ 

[^,71)1 

d(s,t) 

2 

dsdt 

d(u,v) , , 

——- dsdt 

d(s,t) 

T d(y,z)' 
d(u,v) 

2 

+ 

1-1 

T? 

N 3 

05 05 

1_1 

2 

+ 

d(x,y) 

d(u,v) 

2 

~ d(y,z)' 
d(u,v) 

2 r 

+ 

d(z,x) 

d(u,v) 


d(x,y)' 

d(u,v) 

2 

dudv = areaf(5). □ 


Definition 10.5 Let f : £2 —> R 3 : (n,v) —> (x,y,z) be any parametrization of the 
surface patch S = f (17); then the surface area ofS is 



d(x,y) 2 
d(u,v) 


dudv. 


Although the value of A(S) is independent of the parametrization used to compute 
it, our expression for A (S) is still bound to a parametrization. As with total flux, the 
invariance of A(S) would be reflected better by an intrinsic formula, one not bound 
to a parametrization. We can get that formula by looking at the areas of small cells 
on S. 

v Au Av 



Surface area of S 


u 


x 
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An intrinsic formula 
for surface area 


Comparison with 
arc length 


Let f: Q. —> R 3 be a parametrization of S with f(U) = S, and let Q, be a cell 
of the grid J k that meets JJ. Suppose the image f(Q,-) has area AA,- as given by 
Definition 10.5. Then, using the local area muliplier for f (p. 406), we have 


A Aj 



d(z,x) 

d(u,v) 


' d(x,y) ' 2 
d(u,v) 


An Av, 

(«/>Vj) 


1 

so ^ AAj 

i=i 


/ 

1 


J\d(y,z)' 

2 

+ 

d(z,x) 

2 

+ 

d{x,y)' 

2 


d(u,v) 

d(u,v) 



AuAv. 


These sums both converge to A(S). We write the limit on the left as an integral, 
following the usual pattern: 


lim 

/—MX3 

AA;—>0 



dA. 


This gives us the simple intrinsic expression 

A(S) = JJ^dA 

for the surface area of S. Comparing the intrinsic with the parametric expression for 
A(S), we can see why 


dA = 


/ r d(y,z)' 

2 

d(z,x) 

2 

L 

'd{x,yf 

\_d(u,v) 


d(u,v) 


d(u,v) 


dudv 


is described as the element of surface area on S. 

There are striking similarities between surface area and arc length. If the path C 
in R 3 is parametrized by 


f (u) — (x(u),y(u),z(u)), a < u < b, 


then the element of arc length on C is 

ds = 


[ 

dx 

2 

+ 

dy 

2 

dz 

/ 

du 

du 

1 

du 


du, 


and 


arc length of C 



For the integral of a scalar function H(pc,y,z) over the path C (Definition 1.6 and 
Theorem 1.5, p. 18), we have 
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As noted after the proof of Theorem 1.5, the value of the integral on the left does not 
depend on either the orientation of C or the parametrization of C used in the integral 
on the right. 

Definition 10.6 Let S be a surface patch in R 3 , and let H(x,y,z) be a continuous 
function defined on S. We set 


JJ H(x,y,z)dA = JJ H(f(u,v)) ■ 


d(yp- 


d(u,v) 


d(z,x) 2 
d(u,v) 


' d(x,y) ' 2 
d(u , v) 


dudv, 


where f : £2 —> R 3 is a parametrization ofS = f(JJ). 

For the surface integral on the left to be well defined, its value must be independent 
of the parametrization used for S on the right. 

Theorem 10.7. Let f: LI - • R 3 and g : Q* —> M 3 be two parametrizations of S = 
f(U) = g ([/*), with U C LI and U* C Q*. Then 



d{y,z) 

d(u,v) 


d(z,x) 
d(u,v ) 


d(x,y) 

d(u,v) 


dudv 



d(z,x) 

_d(s,t)_ 


+ 


' d(x,y ) ' 2 

_d(s,t) _ 


dsdt, 


for any continuous function H(x,y,z) defined on S. 

Proof. See Exercise 10.7. □ 

For example, if H = p is mass density at a point of S, then the integral of H is 
the total mass of S. If H is density of electric charge (a signed quantity that may be 
negative), then the integral of H is the total electric charge on S. If H = V • n, where 
V is a flux density and n is an orienting unit normal for the oriented surface patch S, 
then the integral of H over S is total flux of Y through S. 

The gravitational field of a hollow sphere is yet another example of a surface 
integral, one that we now construct by adapting the work we did in Chapter 8.1 on 
the field of a flat plate. Let the sphere be a unit sphere S centered at the origin of 
(x,y,z)- space; suppose it has negligible thickness and has uniform density p (mass 
per unit area). By symmetry, it is enough to determine the vertical component of 
the gravitational field at a test point a = (0,0, a) on the positive z-axis. Symmetry 
guarantees that the x- and y-components of the field at a are zero; as we find, it 
matters whether the test point is inside or outside the sphere. 

Let S be divided into small regions S i,... ,57; suppose .S', has area A A,- and con¬ 
tains the point p,- = (x/,y/,z,). Let 


Surface integral of 
a scalar function 


Invariance of the 
scalar integral 


The gravitational field 
of a hollow sphere 
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r ( - = p, - a = (. Xi,yi,Zi - a) 


be the vector from the test point to p, . Then the gravitational field on the test point 
that is due to the region S, is approximately 

, _ GpAAj_ _ GpAAj r _ ^ 

h' ^ ||_||3 r ‘ / 2 2 i / \2\M2 \ x ^yii Zi a )i 

INI (xf +yf + [Zi - a) z y/ 2 

where G is the usual gravitational constant. We can therefore approximate the z- 
component of the gravitational field for the whole sphere S by the (scalar) sum 


field « Gp X 


1 (xj+yj + izt-a) 2 ) 3 / 2 


AAj. 


Now let / —> oo and let the maximum diameter of S, tend to zero; in the limit, the 
sum becomes the surface integral 

field = Gp ff —j ~ ° — , dA = Gp [[ — -\^— -r— dA 

JJs (x 2 +y 2 + (z — a) 2 ) 3 / 2 JJs (1+ a 2 — 2az) 3 / 2 

(because x 2 +y 2 +z 2 = 1 on S). 

To evaluate the surface integral, we use the parametrization 


[x = cos 0 cos <p, 
y = sin 0 cos 
I z = sin<p; 


U: 


—n < 0 < n, 
-n/2 < cp < n/2, 


of the unit sphere, keeping in mind the caveat we made when using the parametri¬ 
zation calculate total flux through the sphere (p. 396). From that earlier work, we 
find 


dA = 


l\ d(y,z) ' 

2 

0(z,x) 

2 

' 0 (x,y) ' 

[d(0,<p). 


0(0,(p) 


0(0,<p) 


dddcp = cos<p d9 d(p. 


We can now compute the field (e.g., using a table of integrals or a computer algebra 
system): 


field = 


Gpf/ej 

= 2nGp 


K l 2 (sin (p — a) cos (p 
—tt /2 (1 +a 2 — 2asin<p) 3 / 2 
n/2 

1 — a sin <p 


d(p 


a 2 \/1 + a 2 — 2a sin <p 
2nGp I 1 — a 1 +a 


~n/2 


2nGp f 1 — a 1 + a 


yj (1 A-a) 2 


1 — a\ 


1+a 
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By assumption, a > 0; therefore |1 +a| = 1 +a and (1 +a)/|l +n| = 1. The first 
term is more interesting: 


Therefore, 


I — a I 1 if a < 1, 

II — a| 1—1 if a > 1. 


I ° 

field = < 4nGp 

{ "cP~ 


if £/ < 1, 

if a > 1. 


Inside, the sphere induces no gravitational field whatsoever. Outside, the sphere acts 
as if all its mass 4 np were concentrated at the origin. On the sphere itself, the field 
is discontinuous. When a = 1, the value of the field is the average of its inside and 
outside values; see Exercise 10.8. 


The field vanishes 
inside the sphere 




The discontinuity occurs where the z-axis passes through the sphere. If we put 
small holes at the north and south poles, then there is no matter on the z-axis, so the 
field should become continuous. The graphs on the right, above, show what happens. 
To determine the new field, we can use the same surface integral, parametrized the 
same way, but with (p restricted to 

-n/2 + e<(p<n/2-e, 


where e > 0 is some small number. Thus, 

1 — a sin (p 


field = 2nGp 


2nGp 


1 +a 2 — 2asin<p 


1 — a cose 


x/2—e 

-x/2+e 

1+acose \ 


a/ 1 +a 2 — 2a cose \J 1 +a 2 + 2a cose / 


A sphere with a small 
hole at each pole 



(Note that sin(±(7r/2 —e)) = ±sin(7r/2 — e) =±cose.) This is indeed a continuous 
function of a; the graphs show e = 0.2, 0.075, and 0.01. It is evident that the field 
strength fades away inside the sphere as the holes close up (i.e., as e —> 0), and the 
field develops a discontinuity where the z-axis meets the sphere. 

There is also an ingenious geometric argument (see, for e.g., the Feynman Lec¬ 
tures [6]) that explains why the field vanishes inside a hollow sphere. However, that 
argument relies on the symmetry of the sphere and cannot be easily modified when 
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Limitations of 
surface patches 


A union of 
surface patches 


How the patches 
fit together 


we break the symmetry with holes at the poles. By contrast, the surface integral for 
the field still works. 

Not every surface can be represented as a single surface patch. For one thing, the 
parametrization that defines a patch is 1—1, and the domain of the parametrization 
has a boundary; therefore the patch itself must have a boundary. A surface without 
boundary (e.g., a sphere or a torus) cannot be a surface patch. Furthermore, because 
a surface patch has a well-defined tangent plane at each point, no surface with edges 
or comers (e.g., a cube) can be a surface patch, either. However, because each of 
these examples can be assembled from a finite collection of surface patches, we are 
led to define a surface S as a union of surface patches S\,...,S k satisfying certain 
conditions. 



To identify those conditions, we can be guided by the surface S illustrated above. 
For a start, we must have 

S = SiUS 2 U---US k , 

where each 5, is a surface patch defined by a continuously differentiable parametri¬ 
zation 

f,:Q,-R 3 , U, cQ h S i = f i (U i ), dSi = fi{dUi). 

Recall (Definition 10.2, p. 392; see also p. 405) that the domain of a parametrization 
is always an open set £2, that extends beyond the closed bounded set U, that is 
mapped to the patch 5,-. 

To ensure that the patches fit together properly, we require that each boundary 
dJJj, and hence each dSj, is a piecewise-smooth closed curve (cf. p. 9), made up of 
a finite number of smooth arcs. Then we require that any two patches Si and Sj meet 
only along their boundaries, and that any three patches meet in, at most, a finite 
number of isolated points. As the figure shows, some arcs are part of the boundary 
of two patches; the remaining arcs, each of which lies in only one patch, together 
fomi the boundary, dS, of S. In the following definition, no orientation is assumed, 
and “smooth” is used in the sense of continuously differentiable. 

Definition 10.7 A subset S o/R 3 is a piecewise-smooth surface if it can be decom¬ 
posed into a finite number of surface patches that fit together as above. 

For example, we can decompose the unit sphere into four surface patches: 
Si, eastern belt; S 2 , western belt; S 2 , northern cap; S 4 , southern cap. The figure 
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in the margin shows the first three patches, separated slightly for clarity. Latitude 
and longitude parametrize the belts Si and S 2 ; only the longitude ranges differ: 


{ x = cos 0 cos <p, 
y = sin 0 cos <p, 
z = sin<p; 


Ui : 


0 < e < n, 

-a <(p<a ; 


U 2 : 


-n< e < 0 , 

—a < (p < a. 


The angle 0 < a < tt/ 2 gives the latitude of the northern boundary of the belts. The 
polar caps are just the graphs of 

z = ±\/l — x 2 — y 2 , f/ 3,4 :x 2 +y 2 < cos 2 a. 


A surface can be decomposed into a finite number of surface patches in many 
ways. Another way to decompose the unit sphere (cf. p. 397) uses just its northern 
and southern hemispheres (S±): 


2 u 


1 


g± : y = 


2v 


z = db 


1 + u 2 + v 2 ’ 

1 — u 2 — V 2 


1 + U 2 + V 2 ’ 


U± :u 2 + v 2 < 1. 


Because the definition of the integral of a scalar function on a surface patch 
(Definition 10.6) does not depend on any orientation of the patch, we can use it to 
define the integral of a scalar function on an unoriented piecewise-smooth surface. 

Definition 10.8 Suppose Il(x.y.z) is a continuous function defined on a piecewise- 
smooth surface S; if S = Si U • • • U Sk is a decomposition of S into surface patches, 
then 

JJ s H(x,y,z) dA = ^ H ( x,y,z ) dA. 

In particular, if H(x,y,z ) = 1, then the integrals define surface area: 



k 

dA = areaS = ^ areaS,-. 
(=1 


A piecewise-smooth surface S has many different decompositions into surface 
patches; therefore an integral over S will be well defined only if its value is indepen¬ 
dent of that decomposition. 

Theorem 10.8. Suppose S = S\ U-'-US* = 7j U---UT m gives two decompositions 
of the piecewise-smooth surface S into surface patches. Then 



z)dA= V ff H(x,y,z)dA. 
dis¬ 



integrating a scalar 
function on a piecewise 
smooth surface 


The integral is 
independent of the 
decomposition 
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From piecewise-smooth 
to smooth 


Proof. Suppose S, is parametrized by ( x,y,z ) = f;(u, v), with f,-((/,) = 5, and 


Ji(u,v) 


/rd(y,z)' 

2 

d(z,x) 

2 

d{x,y)' 

[<9(w,v) 


d(u,v) 


d(u,v) 


the local area magnification factor for f,. Then, by definition, 



HdA = 


JJ^H(ii(u,v))Ji(u,v)d 


udv. 


Let Rjj = Sj D 7); this is a “common refinement” (cf. p. 300) of the decompositions 
given by S, and by 7). (Of course, some sets R,j may be empty.) Because f, is con¬ 
tinuous and 1-1 on Uu the sets 


(U i )j = fi\Rij)CU i 

are closed, bounded, and have area. Also, because S, = UJ = 1 7?y and the Rij are 
nonoverlapping, 

Ui = Uj =1 {Ui)j 

is a decomposition into nonoverlapping sets. Therefore, when i is fixed, each R tJ 
with j = I, m is a (possibly empty) surface patch parametrized by f,, with R,j = 
fi((Ui)j). Hence, 


// H(fj(u,v))Jj(u,v)dudv=\ // H(fj(u,v))Ji{u,v)dudv 

JJUi “l 


UL 


HdA , 


and thus 


k p p km r f 

i a. g 


HdA. 


A similar argument, beginning with Tj, shows that 



□ 


By definition, a piecewise-smooth surface can have edges and corners; at such 
points, the surface fails to be smooth, that is, to have a well-defined tangent plane. 
But edges and comers can occur only where two surface patches of a decomposition 
meet. Thus, in our first decomposition above, the unit sphere could fail to be smooth 
only at a point on the boundary of one of the belts (Si, S 2 ) or one of the caps 
(S 3 , S 4 ). However, these are all interior points of the hemispheres S± of our second 
decomposition (except for the two points on the x-axis), so they must be smooth 
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points of the sphere, after all. (The two points on the x-axis are interior points of 
a third decomposition that uses the polar caps and the equatorial belts rotated 90° 
around the z-axis.) Because every point on the sphere is interior to some surface 
patch in a decomposition, the sphere has no edges or corners: it is smooth. 

Definition 10.9 A piecewise-smooth surface S is smooth if every point of S that is 
not in dS is an interior point of a surface patch that appears in some decomposition 
ofS. 

We turn now to the question of orientation. On pages 388-389, we described two 
equivalent ways to orient a surface 5 in space. First, to each point p in S we assigned 
an ordered pair of linearly independent tangent vectors {vi (p), V 2 (p)} to S at p in 
such a way that each v,(p) varied continuously with p. Second, we assigned to p 
one of the unit normals n(p) to .S' at p in such a way that n(p) varied continuously 
with p. 

A particular surface may admit no orientation whatsoever. One impediment is 
the presence of an edge or a comer, where there is no well-defined tangent plane 
or normal vector. It would appear, then, that a surface that is piecewise smooth 
but not smooth cannot be oriented. In fact, we see below that this impediment can 
sometimes be overcome. A different sort of impediment affects even some smooth 
surfaces; the overall shape of the surface may preclude orientation. Perhaps the sim¬ 
plest example is the Mobius strip. 

The following Mobius strip is the smooth surface M formed by the union of two 
surface patches M\ and parametrized by the same functions and with adjoining 
domains U\ and Up 


{ x = (5 — vcos u) cos2 m, 
y = (5 — vcos u) sin2«, 
z = — vsinu; 


£/i 


0 <u<n/2, 
-1 < v< 1; 


nil <u<n , 
Ul '■ -1 <v< 1. 


M ■ 




The boundary points (0,v) and (zr, —v) have the same image (although the figure 
shows them separated slightly for clarity): 


Smooth surfaces 


Ways to orient 
a surface 


Impediments to 
orientation 


The Mobius strip 


f(O)V) = (5 — v,0,0) = f(7T,— v). 

In effect, the rectangles U\ and Lf form a single ribbon that f bends into a loop, 
joining one end to the other after giving the ribbon a half-twist. Thus, if we follow 
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The Mobius strip 
cannot be oriented 


—vsin2w — 2(5 — vcos u) sim<cos2z<, 
vcos2z< — 2(5 — vcos u) sinz<sin2w, 

2(5 —vcosm)cosm. 

At each point f(0,v) = f(n,— v) on the seam, the parametrization normals are in 
conflict: 

N t (n, -v) = (0, -v, -2(5 - v)) = -JV f (0, v). 

It is therefore impossible to define a nonzero normal that varies continuously over 
all of M: the Mobius strip cannot be oriented. 

Nevertheless, it is still possible to integrate a scalar function over M. In particular, 
M has a well-defined area (that we can compute using a numerical integrator, for 
example): 


d(y,z) 
d(u, v) 
d(z,x) 
d(u,v) 
d{x,y) 
d(u,v) 


a continuously varying normal vector along the center of the ribbon, we find that 
the normals at the two ends, on the seam where the ends of the ribbon join, point in 
opposite directions. 

This direction reversal is shown by the parametrization normal Nf(u,v), whose 
components are the continuous functions 


areaAf = 


//"-// 


M 



dudv 


Ui+U 2 


; 62.9377. 


4(5 — vcosw ) 2 du dv 


Unit normals on 
a smooth surface 


Now suppose that S is a smooth surface and p is a point in the interior of S — 
that is, in S \ dS. By definition, p is in the interior of a surface patch S' in some 
decomposition of S. Suppose f: Q —> R 3 parametrizes that patch, with U C ii and 
f(C7) = S*. If f(a,b) = p, then the parametrization normal 


Nf(a,b) 


f d(y,z) d(z,x) d(x,y) \ 
V5(M,v)’.9(M,v) , 5(H,v) / / (fliZ)) 


is nonzero and is normal to S at p. From it we can construct the two unit normals 
±n(p) to S at p. 
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Definition 10.10 Suppose S is a smooth surface that is pathwise connected; S is 
said to be orientable if a unit normal vector n(p) can be chosen that varies contin¬ 
uously over all points p in S \ dS. Such an assignment is called an orientation ofS; 
we say that S is oriented, and write S. 

A surface is pathwise connected if any two points can be joined by a continuous path 
that lies on the surface. Because an orientable surface 5 is pathwise connected, the 
orienting normal at any point determines the orienting normal at every other point, 
by continuity. Thus, S has two just orientations, which we can denote as S and —S. 

Every surface patch is orientable, by definition. Any surface patch Si in a decom¬ 
position of a smooth oriented surface S has an orientation induced as a subset of that 
surface. We write 

S = 5j + •••-(- Sfc 

to represent a decomposition of a smooth oriented surface into surface patches with 
their induced orientations. We can now define the total flux of of a vector field 
through a smooth oriented surface. 

Definition 10.11 IfS is a smooth oriented surface andY = ( X , 7, Z) is a vector field 
defined on S, then the surface integral ofY over S is 

JJ-.Xdydz + Ydzdx + Zdxdy = X jj ^ Xdydz+Ydzdx + Zdxdy , 

where S = Si H-f Sk is a decomposition into oriented surface patches. 

An individual surface integral on the right is computed using a parametrization of 
the given oriented surface patch. However, the surface integral over S will be well 
defined only if the sum on the right is independent of the decompositon of S into 
oriented surface patches. The following theorem ensures this; it is similar to Theo¬ 
rem 10.8 and can be proven that same way. 

Theorem 10.9. Suppose S = <Sj -)-h .S'/. = T\ - -- T m gives two decompositions 

of the smooth oriented surface S into oriented surface patches. Then 


X // Xdyd. 
1=1 J J Si 


r z+ Y dzdx + Zdxdy = 


X [[Xdydz 
j= \ d-lTj 


+ Ydzdx + Zdxdy. □ 


To illustrate, let us go back and recalculate the total flux of the field ¥ = 
(Cx, Cy, Cz) through the unit sphere S oriented by its outward normal (Example 1, 
p. 396), decomposing S (as on p. 412) into an eastern belt Si, a western belt S 2 , 
a northern cap S 3 , and a southern cap S 4 . For the belts Si and S 2 , we can just use 
the parametrization from Example 1, replacing the domain —n/2<(p<n/2by 
- a < <p < a : 


Orientability of a 
smooth surface 


Surface integrals 


Radial flow out of 
a sphere, again 


JJ Cxdydz + Cydzdx + Czdxdy= J Cdd J 


Si+S 2 


cos tpdtp = (27rC)(2sina). 
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We parametrize the northern cap S 3 as 


1 x = u, 
f 3 : i y = v, 


t /3 : u 2 + v 2 < cos 2 a; 


: = Vl — U 2 — V 2 , 


then 


d(y,z) 

0 1 

u 

d(z,x) 

—ufz —v/z 

V 

d{x,y) 

1 0 

d(u,v) 

—ujz —v/z 

1 

z 

d(u,v) 

1 0 

1 

z 

d(u,v) 

0 1 


= 1 


and 


Cm 2 Cv 2 


Cxdydz + Cydzdx + Czdxdy = I-1- YCz \ dudv = — dudv. 


C 


Therefore (introducing polar coordinates u = p cos 6 , v = p sin 6 ), 

C 


JJ Cxdydz + Cydzdx + Czdxdy = JJ 


S 3 


u 2 +^<cos 2 a 


V 1 — II 2 — V 2 


dudv 


2k rcosa 


= c 


rAK n 

/ de 

Jo Jo 


pdp 


= 2nC ( -Jl-p 2 


= 2^C(1 — sina). 


\/l “ P 2 

If we were to parametrize the southern cap S 4 by just changing the sign of z, 


x = M, 
g4 : < y = v, 


U 4 : m 2 + v 2 < cos 2 a. 


: = — V 1 — m 2 — v 2 , 


then we would have 


d(y,z) 

0 1 

u 

d(z,x) 

—u/z —v/z 

V 

d(x,y) 

1 0 

d(u,v) 

— u/z —v/z 

•) 

z 

d(u,v) 

1 0 

5 

z 

d(u,v) 

0 1 


= 1 


These are the components of the orientation normal 


Ng4 = ->->! = = ~(.x,y,z). 


g 4 parametrizes 

-S 4 , not +S 4 


But because 1 /z < 0, this is a negative multiple of the radius vector (x.y. z) at a point 
on S 4 , and hence points inward. However, the orientation of S requires an outward 
normal here. The remedy is to reverse u and v: 
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i x = V, 
y = u, 

z = — \/1 — u 2 — V 2 , 


U 4 : u 2 + v 2 < cos 2 a. 


Now 


d{y,z) 

1 0 

V 

d(z,x) 

— u/z —v/z 

d(u,v) 

—ujz —v/z 

5 

Z 

d(u,v) 

0 1 


d{x,y) _ 0 1 

1 0 



and 


N u 



— (v ,u,z) = — (x,y,z). 
z z 


This a positive multiple of the radius vector (x,y,z), and hence an outward normal. 
Furthermore, 


Cx dy dz + Cydz dx + Cz dx dy 




du dv 


- dudv = H— : -:-= dudv , 

Z y/l — U 2 — V“ 


as with .S 3 , so 


!k 


Cx dy dz + Cy dz dx + Cz dxdy = 2ttC( 1 — sin a). 


Total flux through .S 4 equals total flux through .S 3 , as is already evident by symmetry. 
Total flux out of the whole sphere is therefore 



Cx dy dz+Cydzdx + Czdx dy 


= 4 nC, 


precisely the value we found earlier, when we assumed the whole sphere could be 
treated as if it were a single surface patch. Although this analysis does not justify 
that assumption, it does show why the earlier computation worked. As a —> n/2, 
sin a —> 1 ; therefore, total flux through the polar caps approaches 0 and total flux 
through the two belts approaches 4 nC. These conclusions are also clear on physical 
grounds. 

At an edge or a comer, a piecewise-smooth surface does not have a well-defined 
normal (or a pair of linearly independent tangent vectors), so it cannot be ori¬ 
ented the same way as a smooth surface. Flowever, a surface patch is always ori- 
entable, and because the boundary of any surface patch used in a decomposition 
of a piecewise-smooth surface is a piecewise-smooth curve, the orientation of the 


Why the earlier 
computation worked 


Orientation with 
edges and corners 
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patch induces an orientation of its boundary. As we see in the figure below, it may 
be possible to orient all the surface patches in a decomposition so that their common 
boundary arcs have opposite orientations and thus cancel each other. 



Orientability of a 

piecewise-smooth 

surface 


Surface integrals 


Definition 10.12 Suppose S is a piecewise-smooth surface that is pathwise con¬ 
nected, and S = Si U • • • U is a decomposition into oriented surface elements. 
Suppose that whenever two surface elements Sj and Sj have a common boundary 
arc, dSj and dSj have opposite orientations there. Then we say S is orientable and 
is oriented by those surface elements. We write 

S = Si + ...+4 dS = dSi + --- + dS k . 

The equation for dS reflects the cancellations that occur on the arcs that pairs of 
different <9S, have in common. The unpaired arcs that remain have a well-defined 
orientation and make up the oriented boundary of S. The surface integral of a vector 
field over a piecewise-smooth oriented surface is defined exactly as for a smooth 
oriented surface; moreover, the definition is independent of the way the surface is 
decomposed into patches. 

Definition 10.13 IfS is a piecewise-smooth oriented surface andY = ( X,Y,Z ) is a 
vector field defined on S, then the surface integral of Y over S is 


JJxdydz + Y dzdx + Zdxdy = X JJxdydz + Y dzdx + Zdxdy, 


where S = S\ H - \- Sk is a decomposition into oriented surface patches. 

Theorem 10.10. SupposeS = S H-(-5*= T\ -f-| -T m gives two decompositions 

of the piecewise-smooth oriented surface S into oriented surface patches. Then 




’z+Ydzdx + Zdxdy = 


X [fxdydz 

/=! T 


+ Ydzdx + Zdxdy. □ 
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To illustrate, let us determine the total flux of V = (x+y,y — x,0) out of the 
unit cube in (x,y,z)- space. Thus, we orient each face of the cube with the outward 
normal, show this gives us an orientation of the surface S of the entire cube, integrate 
V over each face, and then add the results. 

We can parametrize the six faces with affine functions, but it takes some care to 
get the orientations right. To parametrize S x= \, the face that lies in the plane x = 1, 
let 0 be the positively oriented unit square in the (u, v)-plane, and let f x= \ : U —> K 3 
be (x,y,z) = (1 ,m,v). The orientation normal is 


( djv,z) d(z,x) d(x,y) \ 
\ d (u, v) ’d (u, v) ’d («, v) ) 


( 1 , 0 , 0 ); 


it does indeed point out of the cube on S x= 1 • Total flux through S x= i is 

// (x+y)dydz+{y—x)dzdx = // (1 +u) x 1 dudv= dv (l+u)du 

JJs x= i ' JJu Jo Jo 


To parametrize the face S x =o that lies in the plane x = 0, suppose we were to use 
g : (x,y,z) = (0, u, v) with (m, v) again in the oriented unit square U. The orientation 
normal is the same as on 5*=!, 



74 = ( 1 , 0 , 0 ), 

but S x= 0 is on the other side of the cube, so N g points into the cube. Thus g gives 
the wrong orientation. We get the correct orientation by reversing u and v: let f x= o : 
(x,y,z) = (0 ,v,m). Then 

d(y,z) = _, 

d{u,v) 

leading to an orientation normal 


N x =o = (—1,0,0) 

that points out of the cube. With f x= o we find that total flux through S x= 0 is 
ff (x+y)dydz+(y—x)dzdx= [f (0 + v) x — 1 dudv= [ du [ —vdv =—. 

JJs x= 0 JJu Jo Jo 2 



To parametrize the face S v= i, a bit of experimentation suggests that we use f y= \ : 
(. x,y,z ) = (v, 1 ,u). Then 


d(z,x) 

d(u,v) 


1 0 
0 1 


= 1 , 


giving an orientation normal 


N y =\ = (0,1,0) 

that points out of the cube. Total flux through S v \ is 


I «, 

1 , 




X 
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[[ (x+v)dydz+(y—x)dzdx = [[ (l~v)xldudv= I du [ (1 —v)dv=—. 

JJs y= i ' JJu Jo Jo 2 


'A 

S* u 


The parametrization f v= o : (x,y,z) = (i/,0,v) of the face S v= o has the orientation 
normal 

A^ 0 = (0,-1,0) 

that points out of the cube. Total flux through S y= o is 


5^. 

ti 

o 

// (x+ v) dvdz+(y —x)dzdx = l 

/ (0 — u) ■ — 1 dudv = dv 


V 7 / 

tjy —Q 

O 

O 


We can parametrize S z= \ with f z= i : (x,y,z) = (u,v. 1) and S z= o with f, = o : 
(x,y,z) = (v,m,0). Then 

N z =\ = (0,0,1), AUo = (0,0,-l), 

as required. However total flux is zero through both faces: 

// (x + v)dvdz+(y — x)dzdx = // 0dudv = 0. 

JJs z=lfi ' ' JJu 

This is already clear because the flow is everywhere parallel to the (x,y) -plane, so 
V -N z = i.o = 0. Addition now gives us the total flux out of the whole cubical surface S: 


JJS X +t) dydz + (y — x) dzdx 


= 2 . 


Alternate form for 
a surface integral 


The integral of the vector field V = ( X , Y, Z) over an oriented piecewise-smooth 
surface S has the alternate form 


JJ Y-n dA = JJxdydz+Ydzdx + Zdxdy 


that integrates the scalar function V • n over the unoriented surface S. If S is smooth, 
the unit normal n that appears here is the one that defines the orientation of S. If S 

is only piecewise-smooth, then n is not defined everywhere. But if S = -1 -Sk 

is a decomposition into oriented surface patches, then we define 



V • n dA 



V • n, dA , 


where n, is the orienting unit normal on S,. (On the interior of S,, n, is the orienting 
unit normal on S, as well.) 
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10.3 Differential forms 

The integrands of path and surface integrals, and of oriented single and double in¬ 
tegrals, are differential forms. We generate new forms using algebraic operations 
and differentiation; in particular, these operations give us a simple connection be¬ 
tween the forms that appear in the path and double integrals of Green’s theorem. The 
books by H. Flanders [7] and H. Edwards [5] provide more extensive treatments of 
differential forms. 

To fix ideas, we begin with differential forms in R 3 . In (x,y,z)- space, there are 
three “basic” differentials: dx, dy, and dz. A differential £-form, or an exterior 
£-form, or just a A-form a = a(x,y,z), is a sum of “monomials” that contain exactly 
k of these differentials, as follows: 

k general A'-form 

0 g(x,y,z) 

1 P(x,y,z) dx + Q{x,y,z) dy + R(x,y,z) dz 

2 X(x,y,z)dydz+ Y (x,y,z) dzdx + Z(x,y,z) dxdy 

3 H(x,y,z)dxdydz 

>3 0 

A general A'-form is thus a linear combination of certain basic A-forms 

1, dx, dy, dz, dydz, dzdx, dxdy, dxdvdz; 

we require the coefficient functions to have continuous second derivatives. A 1-form 
is the integrand of a path integral, so it is integrated over an oriented 1 -dimensional 
domain. A 2-form is integrated over an oriented 2-dimensional domain, and a 3-form 
is integrated over an oriented 3-dimensional domain. Even a 0-form fits this pattern; 
see below, pages 428-429. 

The sum of two A:-forms is another A:-form in the usual way, and the product of a 
A-form by a function is another A-form. We do not define the sum of a A'-form and an 
/-form when k f l ; for one thing, such a sum could not be an integrand. However, 
we can define the product of a Ar-form a and an /-form /3. It is a (A + /)-form a A/3, 
called the exterior, or wedge, product of a and /3. On the basic differentials, the 
exterior product is anticommutative : 

dx A dy = —dy/\dx = dxdy, 
dy Adz = —dz Ady = dydz, 
dz Adx = —dxAdz = dzdx, 
dx Adx = dy Ady = dz Adz = 0. 

Anticommutativity implies the last line; for example, interchanging the first dx with 
the second gives dx Adx = —dx A dx, so 2 {dx A dx) = 0. 


Forms in R 3 


Algebra; 
exterior product 
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General products 


Integrating a differential 


The definition says that each basic 2-form is just an exterior product; for example, 
dxdy stands for dx A dy. For the basic 3-form, we have 

dxdydz = dx Ady Adz = dy Adz A dx= dz A dx Ady 

= —dy /\dx Adz— —dz Ady Adx = —dx Adz A dy. 

Because anticommutativity forces the exterior product of basic differentials to be 
zero unless they are distinct, there is no nonzero £-form in R 3 when k > 3. Note that 
the exterior product is not anticommutative in all cases: 

(dydz) A dx = dxdydz = dx A (dydz). 

For completeness, we define the exterior product with a 0-form—that is, an ordinary 
function g = g(x,y,z) —as 


g A a = a Ag = g(x,y,z) a(x,y,z) 


for any k- form a. 

We can now compute the exterior product of any two forms by using the distribu¬ 
tive law. For the 1-forms 

a = a(x,y,z) = Pdx + Qdy + Rdz, 0 = 6 (x,y,z) = F dx+ Gdy + Hdz, 
we have 


a AO = (Pdx + Qdy + R dz) A (F dx + Gdy + Hdz) 

= PG dx Ady + PFIdx A dz+ QF dy A dx 
+ QHdy Adz + RF dz Adx + RGdzAdy 
= (QH - RG) dydz +(RF- PH) dzdx + (PG - QF) dxdy. 

A similar calculation of 9 A a would then show that 9 Aa = —a A 9. For the 2-form 

P =Xdydz+Ydzdx + Zdxdy , 

the wedge product a A j8 has only three nonzero terms: 

aAp = (PX + QY + RZ)dxdydz = j3 A a. 


Suppose C is an oriented curve that we parametrize as 

x(t) = (x(t),y(t),z(t)), a<t<b. 
Consider the simple 1-form a = dx on C; we have 

rb 


f a = [ dx= [ x'(t)dt=x(t) 

Jc Jc Ja 


= x(b) —x(a), 
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which is the change in x along C: 


dx = Ax along C. 

Jc 

Now let a = g x dx+g y dy+g z dz, where g x , g y , andg z are the partial derivatives of a 
continuously differentiable function g(x,y,z). (We call g a potential function for a; 
cf. p. 25.) Then, using the chain rule to convert g x x' + g y y' + g z z! into dg(x(t))/dt, 
we have 


j_. g x dx + g y dy + g z dz = j {g x x' +g y y' +g z z')dt 
= 1 j t g{*{t))dt = g{*{t)) 


= g(x(b))-g(x(a)) 


which is the change in g along C. Analogy with the simple differential dx suggests 
we set 

gx dx + gy dy + g z dz = dg , 

and call this the differential ofg, because then we have 


J^dg = Ag along C. 

Using the fact that dg is defined for any 0-form g, we now define the differential, 
or exterior derivative, da of any /e-form a. There are two rules. First, if a = g A fi, 
where j3 is a basic k- form, then 


Differential of 
a function 


Exterior derivative 


da = dg A/3, 

a (k+ l)-form. Second, for any Uforms a and co , 

d(a±co) = da±da>. 

It follows that the exterior derivative of any A:-form is a (k+ l)-form. 

For a general 1-form a = Pdx + Qdy + Rdz, we have 

da = dP A dx + dQ A dy + dR A dz 

= ( P x dx + P v dy + P : dz) A dx+( Q x dx + Q y dy + Q : dz) A dy 
+ (R x dx + R v dy + R z dz) Adz 
= (R y — Q z ) dydz+ ( P z — R x ) dzdx + (Q x — P v ) dxdy. 

For a general 2-form co =Xdydz + Ydzdx + Zdxdy, the calculation is briefer: 

dco = ( X x + Y y -j- Z z ) dxdydz. 
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d 2 = 0 


Anticommutativity 
in d 2 = 0 


Differential forms in R 2 


For a 3-form y = PI dxdy dz, the exterior derivative dy is a 4-form, and hence is 
automatically 0 . 

Theorem 10.11. For eveiy k-form a in R 3 , d 2 a = d(da) = 0. 

Proof. Suppose a is a 0-form, a = g\ then 
d 2 a=d (g x dx + g y dy + g : dz) 

= (gxy dy + gx: dz) dx + (gyx dx + gy: dz) dy + (g :x dx + g zy dy) dz 
= (gzy - gy:) dydz + (g xz - g zx ) dzdx + ( g yx - g xy ) dxdy 
= 0 . 

All the coefficients vanish by the “equality of mixed partials” for functions with 
continuous second derivatives. 

Suppose a is a 1-form: a =Pdx + Qdy + Rdz\ then (see above) 

da = ( R y — Q z ) dydz + (P z — R x ) dzdx + (Q x — P y ) dxdy. 

This is a 2-form whose exterior derivative is 

d 2 a = ({Ry - Q z ) x + [P z - R x ) y + (Q x - P y ) z ) dxdydz 
- (Ryx Q z x “F Pzy Rxy “F Qx: d* yz ) dx dydz 
= 0 . 

Again the “equality of mixed partials” implies that the coefficient vanishes. Finally, 
if a is a k- form with k > 2 , then d 2 a is a (k+ 2 )-form and hence vanishes automat¬ 
ically. Thus d 2 = 0 on all differential forms. □ 

Note that d 2 = 0 is a consequence of the anticommutativity of the exterior prod¬ 
uct on basic differentials. For example, in 

d 2 g = d (g x - dx + g y dy + g : dz ), 


the first term contributes 


and the second contributes 


gxydyRdx 


g yx dx A dy = gxy dx/\dy= —gxv dy A dx. 

The anticommutativity, in turn, is a reflection of the fact that dy A dx represents an 
oriented element of area for a double integral, and dx A dy represents the element of 
area for the opposite orientation. All these relations are nicely illustrated by differ¬ 
ential forms in the plane. 

In the (x.y)-plane. there are just four “basic differentials,” 


1, dx, dy, dxdy, 
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and the general 1-form and 2-form are, respectively, 

a(x,y) = P(x,y)dx + Q(x,y)dy and 9(x,y) = H(x,y) dxdy. 

The exterior derivative is defined as for forms in R 3 ; the exterior derivative of the 
1-fonn a = Pdx+Qdy is 

da = dP Adx ; + dQ Ady 

= (P x dx + Py dy) A dx+ (Q x dx + Q y dy) A dy 

= ( Qx~P y )dxdy. 


This means that Green’s theorem, 

Pdx + Qdy = jjjQx ~ Py) dxdy, 

becomes, in the language of differential forms, 

® oc= da. 

JdR JjR 

We can write this equation in an even more striking way by regarding an oriented 
integral as a function, or map, that assigns a number (the value of the integral) to 
each pair of objects of a particular sort: the first object is an oriented ^-dimensional 
region D; the second is a k- form (O. To emphasize how an integral is the “pairing” 
of a region and a form, let us write it in symbolic fashion as 

0 ,( 0 ). 

For example, D could be the interval [a, b] and <o{x) = g{x) dx; then 


( D,(0) = {[a,b\,g(x)dx) = 


fb 

/ g{x)dx. 

J a 


But D could just as well be a piecewise-smooth oriented surface in space and 
(o(x,y,z) = Xdvdz + Ydzdx + Zdxdy; then (Definition 10.13) 


0 ,( 0 ) 


-Ik 


X dydz +Ydzdx + Z dxdy. 


In terms of this symbolic pairing, Green’s theorem has the form 

(dR,a) = ( R,da). 


The operator d assigns the 2-form da to the 1-form a; the operator d (the symbol is 
a cursive “ d ” in the Cyrillic alphabet) assigns the 1-dimensional region dR to the 2- 
dimensional region R. The symbolic content of Green’s theorem is that each of these 
operators turns into the other when it “moves across” the pairing. In this context, we 


Green’s theorem: 



Green’s theorem: 

( dR,a) = (R,da) 
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Green’s theorem 
in dimension 1 


0-dimensional 

regions 


Orientation 


say that the operators d and d are adjoints. The boundary operator is written as a d 
(albeit as a Russian d ) because it is the adjoint of the exterior derivative, d. 

The adjoint relation between d and d supports the fact that d 2 = 0 (Theo¬ 
rem 10.11), because it is independently clear that d 2 = 0 (i.e., the boundary of a 
boundary is empty: d(dD) = 0). 


Green’s theorem, in its symbolic form ( dR,a ) = (R,da), has a remarkable 
1-dimensional analogue. On the x-axis, there are just two basic differential forms, 1 
and dx, generating the 0-forms G(x) and 1-forms g(x)dx. In the fundamental theo¬ 
rem of calculus, 


rb 

/ G' (x) dx = G{b) — G(a ), 
J a 


the left-hand side is the integral of the 1-form da = G' dx, where a itself is the 
0-form a = G. Can we make the right-hand side into a kind of “0-dimensional 
integral” of a? 

The basic 0-dimensional object is a single point. But the fundamental theorem 
involves a pair of points, the boundary points of [a,b\. To include this case, we take a 
0-dimensional “region” to be any finite collection of points D : {a\,a 2 ,... ,a„ ). Be¬ 
cause a 0-form G(x) takes a value on each of these points—and integrals represent 
sums—one possibility is to define the symbolic 0-dimensional integral to be 


(D, G) = G(fli) + G(a 2 ) + ■ ■ ■ + G(a„). 

However, this fails to produce the minus sign we see in G(b) — G(a). 

To get the needed minus sign, we introduce orientation. Because I — [a, b) is 
itself oriented, we say it induces the orientation {—a, +b} on dl. The signs convey 
the orientation: the minus^sign indicates where I begins; the plus sign where it ends. 
a +1 b a —I b 

+dl: ©-*-© -37: ©---© 

The oppositely oriented —I induces the correct orientation of —dl : { +a, —h\, re¬ 
garded as dl with the opposite orientation. We convey the same information about 
dl if we write it not as a set but as a sum: 


dl = —a + b. 


To see the advantage of this change, let 

J = [b,c\, K = 1 + J = [a,b\ + [b,c\ = [a,c\. 

Then dJ = —b + c and 

d7 + dJ = —a + b — b + c= —a + c = dK. 

Also, d(I — T) = —a + b + a — b = 0, which is consistent with 1 — 1=0. We can 
therefore convert a general D : {a\,... ,a/} into an oriented 0-dimensional region by 
attaching an integer ot, to each a„ and writing the result as 
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D = m\a\ H-b ffz/a/. 

If we change either a point a, or a “weight” m,. then D is a different oriented 
0-dimensional region. Finally, if G is a 0-form, we define the 0-dimensional ori¬ 
ented integral 

(Z),G) = m\G(a\) H-bm/G(a/). 

In these terms, the fundamental theorem of calculus takes the form 

(I, da) = (dl,a), 

where a = G is a 0-form and I is an oriented interval. The fundamental theorem and 
Green’s theorem thus make the same assertion about a ^-dimensional region and a 
£-form, only for different values of k. They are instances of a more general result, 
called Stokes’ theorem, that we consider in the next chapter. 

What happens to a differential form under a change of variables? For example, 
consider the change to polar coordinates with 


| x= rcosO, 
I_y = rsin0. 


Then, because we can treat x and y as functions of r and 9, 


I dx = cos9dr— rsinOdd, 

1 dy = sinOdr + rcos9 d6. 

The element of area, dxdy, is transformed into 

dxdy = (cos 9 dr — rsinO d9) A (sin0 dr+ rcos9 d9) 
= r cos 2 9 dr Ad9 — rsin 2 9 d9 f\dr 
= rdrd9 , 


the element of area in polar coordinates. For a second example, consider the 1-form 

—y jc 

a (x,y) = 2 , 7 dx + ? , 2 d y- 

x l -by- X- +y z 

If we use <p*a(r,9) to denote the new form after the polar coordinate change <p is 
applied, then 


—v sin G r cos 0 

*a(r,9 ) =- ^—(cos 9 dr— rsin9d9)-\ - j—(sin0<?r + rcos0rf0) 


— sin 9 cos 9 + sin 9 cos 9 


dr+ (sin 2 9- bcos 2 9)d9 


Fundamental theorem: 

{I, da) = (dl,a) 


Differential forms in 
polar coordinates 


= d0. 
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The graph of 9{x,y) is 
an infinite spiral ramp 


Winding number 


In other words, a(x,y) is the differential of the function 

y 

O(x.y) = arctan-, (x.y) ^ ( 0 , 0 ), 
as we can verify directly: 

36 —y/x 2 —y dO 1 /x x 

dx 1 + iy/x ) 2 x 2 +y 2 ’ dy 1 + (y/x ) 2 x 2 +y 2 

The graph of the function z = 0 (x,y) is a spiral ramp, as shown below. The z-axis 
is not part of the graph, because (x,y) ^ (0,0). The ray 0 = c in the (x,y)-plane 
is carried to the level z = c but also to z = c + 2nn for every integer n. The polar 
angle 0 is therefore multiple-valued in a particular way; the graph reflects this by 
spiraling around the origin infinitely many times, with successive levels separated 
by Az = 2k. 



/ a = / . ' , dx + , X . dy = A0 on C. 

Jc Jc x A +y z x z +y z 

Ordinarily, the integral of the differential dg of a function g is zero on a closed path, 
because Ag = 0 there. The same is true of the integral of dO on a path like C 2 that 
does not enclose the origin. However, on a path like C| that does enclose the origin, 
z = 0 does not return to its starting value (it “climbs the ramp”), and A0 / 0. 

Definition 10.14 Let C be an oriented closed path that does not meet the origin; 

the winding number ofC is 

IV(C) = — [ 7 7 dx + j dy. 

2 n Jc x 2 +y 2 x 2 +y 2 
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By what we’ve said, W (C) = Ad/2n is an integer; it counts the net number of times 
C “winds around” the origin in the positive sense minus the number of times in the 
negative sense. Furthermore, W(—C) = —W(C). 

We now want to determine how a differential form is transformed when a map 
introduces new variables (possibly more general than an invertible change of vari¬ 
ables). First, consider a map from (an open set in) R 2 to R 2 : 

{ x = x(u 1 v), \dx=x u du + x v dv 1 

y=y{u,v), ydy=y tl du+y v dv. 

We assume that f is continuously differentiable, but do not assume, for the moment, 
that it is invertible. In other words, f need not be a coordinate change. A general 
0 -form a(x,y) = g{x,y) is transformed into 

f a{u,v) = g*(u,v) = g(x(u,v),y(u,v)). 

A general 1-form a(x,y) = P{x,y) dx+ Q(x,y) dy is transformed into 

f* a(u,v) = P(x(u,v) ,y(u,v))(x,< du + x v dv) 

+ Q(x{u,v),y{u,v))(y ll du+y v dv) 

= (P*x u + Q*y u ) du + {P*x v + Q*y v ) dv. 


For some purposes, the functions in differential forms should have continuous sec¬ 
ond derivatives. For Fa to meet that requirement, the components of f should have 
continuous third derivatives, because Fa contains the first derivatives of those com¬ 
ponents. 

The basic 2-form a(x,y ) = dxdy is transformed into 


f* a(u,v) = (x„du +x v dv) A (y u du+y v dv) 

d(x,y) 

= x u y v duAdv + x v y u dvAdu = . . ' dudv. 

o(u,v) 

Therefore, if cc(x,y) = g(x,y) dxdy, the general 2-form in R 2 , then 

f* a(u,v) = g*(u,v) | dudv. 

o[u,v) 


Notice that, although f maps the («, v)-plane to the (x,_v)-plane, the map f “goes 
the other way:” it maps differential forms on the (x, v)-plane to differential forms on 
the («. v)-planc. Thus f pulls back forms from the (x. v)-plane to the (m, v)-plane; 
we call it the pullback on forms defined by f. 


f. 

forms in ( u , v) «- 
( u,v ) 


How a mapping 
transforms a /r-form 


The pullback on 
differential forms 


f 


forms in (x,y) 

(*,y) 






Pulling back a map 

f : R 2 —> R 3 


Surface integrals 
reformulated 


The pullback on M 3 


432 10 Surface Integrals 

The map f need not preserve dimension; indeed, f: R 2 —> R 3 is an important case: 

{ x = x(u,v), [ dx = x u du + x v dv, 

y = y(u, v), < dy = y u du +y v dv, 

z = z(u,v). I dz = z u du+z v dv. 

Pullbacks of 0- and 1-forms are similar to those for maps from R 2 to R 2 . To be 
specific, if a(x,y,z) = g(x,y,z), then 

T a(u,v) = g*(u,v) = g(x(u,v),y(u,v),z(u,v)); 

and if a = P(x,y,z) dx -f Q(x,y,z) dy + R(x,y,z) dz, then 

fa(«,v) = {P*x u + Q*y u +R*z u )du + (P*x v + Q*y v + R*z v )dv. 


There are similarities for 2-forms, as well; we just need to take into account that 
there are now three basic 2-forms in R 3 : dxdy, dydz, and dzdx. However, in R 2 
there is only one basic 2-form, so the previous case of maps from R 2 to R 2 need 
merely be applied three times: 

f *dxdy= 4 ’~ V \ dudv, f *dydz= H' \ dudv , fdzdx= ' \ dudv. 
o(u,v ) d(u,v) o(u,v) 

Thus, for the general 2-form a = Xdydz+ Ydzdx+ Zdxdy, the pullback is 


fa = X* 


d(y,z) 


Y* d(z,*) z , d(x,y) 


d(u,v) d(ii,v) 

d(y,z. 


d(u,v) 


dudv 


= ( X(f(u,v)) +Y{f(u,v)) d Jf' X \ +Z(f(u,v)) dudv. 

a{u,v) d{u,v) d(u,v)) 


The final case to consider is the general 3-form a = H(x,y,z)dxdydz, but its pull¬ 
back is zero because every 3-form in two variables must reduce to zero. 

The language of differential forms and pullbacks gives us a vivid and succinct 
way to reformulate the definition of a surface integral (Definition 10.4, p. 402). 
Thus we are given an oriented surface patch S and a 2-form 


ft) =Xdydz+ Ydzdx + Zdxdy 

that is defined and continuous on S. If f: Or —> M 3 : U 2 —> S parametrizes S (Defi¬ 
nition 10.2, p. 392), then 


II 0) = JJ (0 is by definition equal to jj f co. 


Let us now determine the pullback map for a continuously differentiable map 
from R 3 to R 3 : 
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{ x = x(w,v,w), [ dx = x u du + x v dv + x w dw, 

y=y{u,v,w), < dy=y u du+y v dv+y w dw, 

z = z(u,v,w), [dz = z u du + z v dv+z w dw. 

Again, pullbacks of 0- and 1-forms are similar to what we have already seen. For 
the general 0-form a(x,y,z ) = g(x,y,z), 

f*g = g*(u, v, w) = g(x(u, v, w) ,y(u, v, w),z(u, v,w)); 

for the general 1-forma = Pdx + Qdy + Rdz, 

f* a(u,v,w) = (. P*x u + Q*y u +R*z u )du + {P*x v + Q*y v +R*z v )dv 
+ (P*x w + Q*y w + R*z w ) dw. 

With 2-forms, there are complications we have not seen before, because there are 
three basic 2-forms in the source. We begin with the basic 2-form dxdy in the target, 
and write 


f dxdy = (x u du + x v dv+x w dw) A (y u du +y v dv+y w dw) 
= x u y v du A dv + x,y w duf\dw + x v y„ dv A du 
+ x v y w dv/\dw + x w y u dw A du + x w y v dw A dv 

- d{ - x P d udv + ^Advd^^£Adwdu. 


d(u,v) 


d (v, w) 


d (w, u) 


Using similar results for the other basic 2-forms (i.e., dydz and dzdx\ see the exer¬ 
cises), we find that the general 2 -form 


a =Xdydz+Ydzdx + Zdxdy 


is transformed into 


ra( B ,v,w)= [x*2M.+r 2&4+Z- a(v) 


+ 

+ [X* 


d (v, w) 

d(y,z) 


d (v, w) 




d (v, w) 
d(x,y) 


d(w,u) d(w,u) d(w,u) 
d{y,z 


d(u,v) 


d{z,x) d(x,y) \ 

d{u,v) d{u,v)) 


dvdw 

dwdu 
du dv. 


The pullback of the general 3-form a = Hdxdydz is straightforward: 

f* a(u,v,w) = H* j \ dudvdw. 
d{u,v,w) 

Differential forms are involved in the calculation of physical quantities (e.g., 
work and total flux) whose values should be independent of the coordinate frames 


Comparing integrals 
of a and q>*a 
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Transforming the 
integral of a 1-form 


Transforming the 
integral of a 2-form 


in which they are computed. This prompts us to determine how the integrals of a 
and its pullback (p* a are related when <p is a coordinate change, that is, an invertible 
map with a continuously differentiable inverse. 

Theorem 10.12. If a(x) = Pdx + Qdy + Rdz is a 1-form, C is an oriented curve in 
u-space, and x = <p(u) is a coordinate change, then 



Proof. For simplicity, take a = Pdx; the other terms Qdy and Rdz can be handled 
the same way. Then we know that 


<p*a(u) = P*x u du + P*x v dv + P*x w dw, 

where P* (u) = P((p( u)). Let C be parametrized as (z/,v,w) = u(f), with a <t < b. 
Then <p(C) is parametrized by (x,y,z) = <p(u(f)), a<t<b, and we have 



An abbreviated version of the same proof shows that the theorem is true for 1-forms 
cc(x,y) on the plane. 

If a(x,y) = gdxdy is a 2-form, D an oriented region in the (w, v)-plane, and <p is 
an invertible coordinate change, then 



[[ _ gdxdy = [fg* 

J J<p(D) JJD 


d(x,y) 

d{u,v) 


dudv = 



The second equality is the change of variables formula for oriented double integrals 
(Theorem 9.14, p. 357). The next theorem deals with 2-forms in space. 

Theorem 10.13. . If cc(x) =Xdydz ■ Ydzdx + Zdxdy is a 2-form, S is an oriented 
surface patch in u-space, and x = q>(u) is a coordinate change, then cp(S) is an 
oriented surface patch in x- space and 



Proof. Because S is an oriented surface patch in (zqv, w)-space, it has a parame- 
trization f : —>■ IR. 3 : (^,T) —>■ (u,v,w) with 5 = f(t/) for some closed, bounded, 
positively oriented set <7 c 12 with area (Definition 10.2, p. 392). Because (p is a 
coordinate change in R 3 , the map 



10.3 Differential forms 


435 


serves to parametrize <p (S) = (<p o f) (u), which is therefore an oriented surface patch 
in (x,y,z)-space. The two surface integrals that appear in the theorem can therefore 
be defined using the pullbacks of f and (p o f. The following lemma indicates how 
these pullbacks are related. 

Lemma 10.1. For any 2-form a, (<pof)*a = (fo^i*)a = f*(<p*a). 

Proof. For simplicity, take a =Xdydz; the other terms Y dzdx and Zdxdv can be 
analyzed similarly. We have 

(<pof)*a(s) =X{<p( f(s))) ^j-p-dsdt. 

o(s,t) 


We also have 

<P*a( u) =X{<p( u)) 
from which it follows that 


^ Mv+lff dwdu+ ^yf dud 2 


d (v, w) 


d(w,u) 


d(u,v) 


f(<p*a)(s) 

= X(9{ f(*))) 


f d(y,z) * d(v,w) | d(y,z) * d(w,u) 
\<9(v, w) d(s,t) d(w,u) d(s,t) 


djv,z) * d{u, v) \ 
d(u,v) d(s,t) J 


dsdt. 


The three Jacobians marked with asterisks (which are usually not written explicitly) 
are understood to be functions of s and t via pullbacks. By Exercise 10.27, 


d(y,z) d(v,w) | d(y,z) d(w,u) | d(y,z) d(u,v) = d(y,z) 
d(v,w) d(s,t) d(w,u) d(s,t) d(u,v) d(s,t) d(s,t) 


To complete the proof of the theorem, we use the lemma and twice invoke the 
new formulation (p. 432) of the definition of a surface integral as the ordinary double 
integral of a pullback: 


JJa= JJ a = JJ(<poi)*a = JJfo<p*a = JJ(p*a = JJq>*a. □ 

<p(§) <p(f(t7)) u u f (u) s 


Corollary 10.14 Suppose that S is a piecewise-smooth oriented surface; then so 
is (p ( S ), and 



Proof. Let S = S\ H-h W be a decomposition into oriented surface patches, and 

suppose Sj is parametrized by f,-: £2 ,• — > R 3 , with f, ((/,) = Si. Then, by the proof of 
the theorem, <p(5,) is an oriented surface patch parametrized by (p of,. Therefore, 
because (p is 1-1, 


Allow S to be 
piecewise smooth 


(p{S) = (p{Si) + --- + (p{S k ) 
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is a piecewise-smooth oriented surface. Finally, using the theorem on each pair of 
surface patches <p(5,) and Si, we obtain 

</>(?) q>(Si) Si S 


Transforming the 
integral of a 3-form 


<p and <p* 
are adjoints 


d and f* commute 


The final possibility we must consider is how the integral of a 3-form a = 
// (x ) dxdydz transforms under a coordinate change x — <p(u). We know 

<p*a = H*{ u) df' 1 '* 1 \ dudvdw , 
o(u,v,w) 

where H*( u) = H(<p( u)). The integrals here are triple integrals over an oriented 
region/) in u-space and its image (p(D) in x-space. The change of variables formula 
for oriented triple integrals (Theorem 9.16, p. 363) yields 

/// a= ^ H(x) dxdydz 
JJJ<p(D) JJJ(p(D) 

= [[[ d( A 'W-) dudvdw = III <p*a. 

JJJd d(u,v,w) JJJd 

Theorem 10.15. The integral of a differential k-form in n variables (where k < n 
and n = 1,2, 3) is invariant under a coordinate change. □ 

In terms of the symbolic integral pairing (p. 427), all the statements about the 
invariance of integrals under coordinate changes have the form 

(<p(D),a) = ( D,<p*a), 

where a is a k- form in n variables, n = 1,2,3. In other words, the map (p and its 
pullback (p* are adjoints (p. 428). This is the essential content of the change of 
variables formulas (wherein we take k = n). Incidentally, it is easy to check that the 
pairings are also equal when n = 0. 

How does exterior differentiation interact with a pullback? Does it make a dif¬ 
ference if we apply the exterior derivative before or after applying a map? In other 
words, do d and f commute? To explore this question, let us return to differential 
forms in just two variables. First consider the 0-form a = g(x,y); then 

T a = g(x{u,v),y{u,v)), 


SO 
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d(T a) = —g(x(u,v),y(u,v))du + —g(x(u,v),y(u,v))dv 

ou ' c/y 

= (gi(x(u, v) ,y(u, v) )x„ + g 2 (x(u, v), y(u, v))y u ) r/w 

+ (gl {x(u, v) ,y(w, v))* v + g2W«i v ) V ))jv) 

= (gi*» + giy») du + (g\x v + g* 2 y v ) d v. 

Here gj is the partial derivative of g(x,y) with respect to its ;th variable, and g* = 
(gi)* =f*(gi) (so g* f {g*)h in general). On the other hand, da =g\ dx+g 2 dy, and 

f(da) =g| • (x u du+x v dv)+g 2 - ( y u du+y v dv ) 

= {g\x u +g*2yu)du + (g\x v +g*2y x )dv 

= d(fa). 

Next, consider the 1-form a = Pdx\ then 

fa=P*x u du + P*x u dv, 


so 


d(f*a) = P*x u ) dvAdu + -^-(P*x v ) duAdv 

dv du 

= [- (PiX v + P 2 y v )x u -P*x uv +(Pfx u + P 2 y u )x v + P*x vu ] du dv 
= —P 2 iyvXu — y u x v ) du dv 

d{x,y) 


= -p; 


- dudv. 


2 d(u,v) 

On the other hand, da = —P 2 dxdy, and (from p. 431) 

f(da) = -P 2 ^Y^\dudv = d(fa). 
d(u,v) 


Analysis of a = Qdy is similar. If a(x,y) is a A-form with k>2, then da = 0 = 
d(Ta). 

Theorem 10.16. For any differentiable map (x.y) = f(w,v) and k-form a(x,y), 
f (da) = d(f* a). □ 

In fact, this theorem holds for differential forms a(x\,... : x„) in any number of 
variables. In Exercise 10.28, you are asked to give a proof for n = 3. 

The language of differential forms and symbolic pairings for integrals gives us a 
new way to look at the proofs of some earlier theorems. For example, consider the 
change of variables via Green’s theorem (Theorem 9.21, p. 370): 

Suppose f(s,t) = (x(s,t),y(sj)) has continuous second derivatives on a bounded 
open set Q. in R 2 . Let S C £2 and T = f(S) be closed oriented sets whose boundaries 
dS and dT are simple closed curves. Assume that Green’s theorem holds for both 


The change of 
variables formula with 
integral pairings 
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S and T, that dS and f{dS) have common decompositions into smooth oriented 
curves, and that f (dS) = k- dT as oriented paths. Then, for any continuous function 
g(x,y) on T, 


k JJ f g(x,y)dxdy = JJ^g(x(s,t),y(s,t)) dsdt. 

Proof. As in the original proof, choose G(x.y) so that G x (x,y) = g(x,y), and let 
a = Gdy, da = gdxdy. A key step there was to transfer the path integral of a over 
f (dS) to the path integral of 

f*a(s,t) = G(x(s,t),y(s,t))(y s ds+y t dt) = G*y s ds + G*y t dt 

over dS. In the language of symbolic pairings, (f (dS),a) = (dS, f a), indicating 
that f and f* are adjoints. The original proof invoked the results of an exercise. 
For future use, we restate these results in the language of differential forms and 
pullbacks. 

f and f are Lemma 10.2. Let f : U 2 R 2 be continuously differentiable, and suppose C and 

adjoints on 1 -forms f(C) are piecewise-smooth oriented curves with a common decomposition into 

smooth oriented curves: 

C = Ci H-hC m , f(C) =f(Ci)H - bf(C m ). 

Then, for any 1 -form a, (f(C),a) = (C,f*a). 

Proof. See Exercise 4.37, page 149. □ 

The proof of the change of variables theorem using Green’s theorem now follows 
from this sequence of equalities. 


k g(x,y)dxdy = k(T,da) 


= k{df,a) 

Green’s theorem on f 

= {kdf,a) 

= (f (dS),a) 

hypothesis 

= {dS,fa) 

f and f* are adjoints 

= (S,d(fa)) 

Green’s theorem on S 

= (S,f{da)) 

d and f* commute 

= jj s g(x(s,t),y(sf)) 

dsdt, 

d(s,t) 

because f (da) =g*(s,t) dsdt. 

d(s,t) 

□ 
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It is possible to construct differential A-for ms in any number of variables. In 
(x\,X 2 , ■ ■ ■ ,x„)-space, there are n basic differentials, dx\, dx 2 , ..., dx„. For each 
multi-index I = 4) with 1 < i\ < ■ ■ ■ < 4 < n, we define the basic k- form 

dxj = dxi x dxj 2 ■ ■ ■ dxj k = dx (1 A <fx, 2 A • • • A dxi k . 

There are as many basic £-forms as there are ways of choosing k distinct elements 
from a set of n elements; this number is 

( . n\ n\ 

k J k\(n — k)\ 

(“« choose k”). A general /r-form is a linear combination 

a = ^ j P I (x u ...,x n )d\ I 
1 

of the (£) basic Ar-forms in which the coefficient functions Pi all have continuous 
second derivatives. Products can then be calculated using the anticommutativity re¬ 
lations on the basic 1-forms: 

dxi Adxj = —dxj /\dxi, i,j= 1 

As we pointed out above, anticommutativity here implies 

dxi A dxi = 0, i=l,...,n. 

If <7 is a A'-form and T is an /-form, then anticommutativity on the basic 1-forms 
implies 

ctAt = (-1) h tAct. 

The exterior derivative of the 0-form g(x\ ,...,x n ) is the 1-form 

n 

dg=^gidxi, 
i =1 

where gi = dg/dxj. For a general A-form 

a = ^Pidxj, 

1 

the exterior derivative is the (A+ l)-form 

da = ^dPj /\dxi, 

1 

obtained using the exterior derivatives dPj of the coefficient functions of a. 
Theorem 10.17. For every k-form a in R", d 2 a = d(da) = 0. 


Differential forms 
in n variables 


d 2 = 0 



440 


10 Surface Integrals 


Proof. As in the essentially identical Theorem 10.11, the proof reduces to the 
“equality of mixed partials” for functions with continuous second derivatives. If 
a = IjiPidxi, then 


The product rule 
for differentials 


da = ^ dP[ A dxi = X ( S 3“ ^ x i ] A dxj , 

/ i Vi=t (X ‘ ) 


and 


d 2 a = 




i \i=i 


d 2 ? 


A dxj = ^ ^ —— —dxjAdxi\Adxj. 

I \ij=l oxjdJCi I 


The inner sum consists of n 2 terms. For each of the n terms with j = i, 


d 2 p, 


dxi A dxt = 0. 


dxj dxi 

Now pair each remaining term with the term in which i and j f i are interchanged: 

d 2 Pi d 2 P[ 

dxj A dxj + ———— dxt A dxj. 


Because 


dxj dxi 

d 2 P, d 2 P 


dxi dxj 

and dxj A dxi = — dxt A dxj, 


□ 


dxj dxi dxj dxj 

each pair sums to zero, so the entire inner sum equals zero. 

Theorem 10.18 (Product Rule). Suppose a and 6 are differential forms in R", and 
a is a k-form. Then d{a A 0) = da A 6 + (—l)*a Add. 

Proof. It is sufficient to show this for “monomials” 

a=Pdxj and 9 = Qdxj 


that have disjoint multi-indices / and J. Then a A 6 = PQdxi A dxj, and we have 
d{a A 0) =Y J {PmQ J r PQm) dx m A dxj A Xj 

m 

= ^ PmQ dx m A dx] A Xj + ^ PQm dx m A dxj A xj, 

m m 

where P m = dP/dx m and the summation can be restricted to those indices m that do 
not occur in either I or J. The first sum is da A 0, because 


da = ^P m dx m A dxi and da A 9 =^P m Q dx m A dxj A dxj. 

m m 

The second sum is (— 1 ) A ’oc Ad9, because d9 = 1 Qm dx m A dxj and 

m 
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aAdO = J^PQm dxi A dx m Adxj = (— 1 f'LPQm dx m Adxj Adxj. 

m m 

The last equality is a consequence of the anticommutativity of basic 1-forms: 

dxi A dx m = dxi x A • • • A dxi kx A dxt k A dx m 

= (— 1) dxj x A ... dxi k x A dx m A dxi k 
= (— 1 ) 2 dxi x A • • • A dx m A dxj k x A dxj k 


— (—l) k dx m Adxj x A---Adxj k l A dxt k 
= (— \) k dx m Adxj. □ 

Let us see how the coefficients of a (A — l)-form a in n variables determine the 
coefficients of its exterior derivative co = da, a A-form. We use the A-multi-index 
I = (ii,..., 4), 1 < 4 < • • • < 4 < « for the coefficients of co, 

CO = ^Pjdxj. 

1 

For the coefficients of a, we use the (A— 1)-multi-index 

A (z"i,..., is, • • •, 4)? ^ 1 ?? A, 

the circumflex over 4 means that 4 is deleted, so each I s contains only A— 1 indices. 
Thus we can write 


a = y L A z dx 4 > =E d ( A %) ^4; 

4 4 

There are (£) A-mult-indices 7 and (^"j) (A— l)-multi-indices4. 

Given that da = CD, we must determine what contribution the various terms of 
d (Aj ) dxj make to the term Pj dxj. We have 

dAj - 

d (Aj ) dxj- = 2_j — 1 dxj dxj x ■ ■ ■ dx Xs ■ ■ ■ dxi k , 
j dx j 

and the sum has only one nonzero term, the one in which the summation index j 
equals i s . It then takes s — 1 successive transpositions to move dxj = dx is from its 
initial position in that term to its proper position in the basic differential; that is, 

dxj s dxj x ■ ■ ■ dxi s ■ ■ ■ dxi k = (— l)' s_1 (7x 1 - 1 • • ■ dx h ■ ■ ■ dxt k = (— l) i-1 <7x/. 


Coefficients of da 


This proves the following theorem. 
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Action of the pullback 


Theorem 10.19. If a = ^Aj- dxj, then da = y 

% 1 


K-i r 1 


dAf 

dx is 


dxj. 


□ 


Now let us consider how a differential form is transformed by the pullback of a 
differentiable map f: U" —> R p with component functions 

X\ = X\ (u \,..., Mfl), 

X2 X2 ( U 1, • • • j tin ), 

f: < 

= X p (u ] , . . . , W M ) . 


Here U” is an open subset of R", and we allow n f p. For a 0-form, a(x) = g{x), 
the pullback is 

f*«(u) =g(f(u)) = g*(u). 

For a basic 1-form dx the pullback is 


" dx- 

f*dx, = y ——— du j : 

k du 


i=l. 


For the basic 2-form dx\ dx 2 , we have 


f* (<fxi A dx 2 ) = f* dx 1 A f* dx 2 = ^ du,- A 


duj du„ 


v, dx 1 <9x2 , , 

= > —-—- dU/Adll,, 

jZ, du i dll »< 


y ——— ———- duj A diiff 

j>m du J dll »» 


Anticommutivity simplifies this. If we transpose the dummy summation indices 
(i.e., j <-> w) in the second sum, then that sum becomes 


y ——- du m Kdu -, 


m>j d 11 m dUj 


1 

j<m 


dx 1 dx 2 

- -—- dUj A du m . 

du m duj 


Recombining this with the first sum, we get 


f*(dx\ dx 2 ) = 


/ 5xj 5x2 dx\ dx 2 \ 
^ \diij du m du m duj J 


j<m 


duj A du n 


_ v 5 (xi,x 2 ) 

— — ” duj d u t 


j<m 


d(uj,u m ) 


'] uu m- 


This is essentially the same as the earlier calculation of f ( dxdy ) on page 433. More 
generally, if I = (i \, h) is any pair with 1 <i\ < A < p, then 
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f *{dx h dx i2 ) = 


d(x il ,x l 


N du j du m . 


j<m d{Uj,U m ) 

We can write this in a way that is both more compact and more striking: 

fdx/ = V - duj. 

,duj 


The summation multi-index J consists of all pairs J = (j \, ji ) with 1 < 71 < h < n, 
and 

dxj _ d(x, n x, 2 ) 
duj d(uj x ,u h )' 

In fact, the same formula holds when dxj is a basic k- form (where / = (ii,..., 4) 
and 1 < i\ < • • • < 4 < p ): 


fdxj = YjJT~ c{u J' 
i duj 


where now J = {j\.... J k ) with \ <j\<---<j k < n , and 

dx, _ d{x ix ,...,x ik ) 
dur d(u h ,...,u Jk y 

Theorem 10.20. If a = ^ Pf x) dxj is a general k-form, then 




□ 


rdn^p-duj 
, on, 


Pullback of a 
general /c-form 


Exercises 

10.1. Suppose there is a steady flow of matter given by the vector ¥ = (2,—7,1) 
kilograms per second per square meter. (All space coordinates are given in 
meters.) 

a. In 1 second, how much matter passes through a unit square in the (x,y)- 
plane in the positive z-direction? Through a unit square in the (y,z)-plane 
in the positive x-direction? Through a unit square in the (z,x)-plane in the 
positive y-direction? 

b. How much matter passes through a triangle with area 12 meters 2 in the 
(x,y)-plane in the positive z-direction in 7 seconds? 

c. In 10 seconds, how much matter passes through the rectangle with ver¬ 
tices 
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(5,0,0), (5,3,0), (5,3,6), (5,0,6) 

in the direction in which x increases. 

d. In unit time, how much matter passes through a unit square in the plane 
x +y +z = 1 in the “upward” direction (i.e., the direction in which z 
increases)? 

e. Calculate how much matter passes through each of the six faces of the 
unit cube Q in the first octant, in the outward direction on each face in 
unit time. The sum of these six numbers is zero; why? 

10.2. Determine the total flux of the flow field V = (0 ,z,x) through: 

a. The unit square in the (y,z)-plane, oriented in the positive x-direction. 

b. The unit square in the (x, y) -plane, oriented in the negative z-direction. 

c. The triangle with vertices (2,2,0), (0,2,2), (2,0,2), using this ordering 
of the vertices to orient the boundary and thus the triangle itself. 


10.3. 


Determine the flux of the flow field V = (x,y,z) through the surface S given 
by 


X = U + V 

2 2 
y = u — v 

z = 2 uv 


0 < u < 3 
0< v< 1 


Assume that 5 inherits the positive orientation of the (u, v)-plane. 

10.4. Calculate the flux of Y = (. x,y,z ) out of the sphere S of radius R centered at 
the origin (x,_y,z) = (0,0,0) to show 



V- n dA = AnR?. 


10.5. Calculate the flux of V = (—_y,x,0) out of the rectangular parallelepiped P 
in (x, y, z) -space given by 0<x<5, 0<v<3, 0<z<2. 

10.6. Let g : R 2 —> R 3 : (w, v) —> (x,y,z) be the map defined on page 397: 


2 u 


2 u 


x = 


1 + U 2 + V 2 ’ 


y = 


1 + U 2 + V 2 


1 — u 2 — v 2 

1 +U 2 + V 2 


a. Let S be the unit sphere x 2 +)’ 1 +z 2 = 1 minus the “south pole” (0,0, —1). 
Show that g(R 2 ) C S. 

b. Show that g maps the (m, v)-plane onto S by expressing (w, v) in terms of 
(x,y,z) when g(«, v) = (x,y,z). (That is, “invert” g on S.) 

10.7. Prove Theorem 10.7 (e.g., by modifying the proof of Theorem 10.6). 

10.8. When a = 1, the integral expression for the gravitational field of the hollow 
sphere (p. 410) involves the improper integral 
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l 


K l 2 (sin<p — 1) cos<p 
-k /2 (2 — 2 sin<p) 3 / 2 


d(p. 


Show that the improper integral converges and has the value — 1, implying 
that the z-component of the field is —2 nGp when a = 1. 

10.9. Determine the surface area of the torus 


x = (.K + flCOSv)cos«, y = (R + acosv)sinu, z = a sinv, 


where R > a > 0 and 0 < u, v < 2n. 

10 . 10 . Calculate the differential dg when 

a. g(x, y) = x 3 - Sxy 2 ; 

b. g(x,y) = sin(xy); 

c. g(x,y) =xcosy— j^sinx; 

d. g(x,y,z) = In \/x 2 +y 2 ; 

e. g(x,y,z) =xy+yz + zx; 


f. g(p,(p,9) = psincpcosd; 
g- g(x,v) = arctan (y/x); 

h. g(x,y,u,v) =xu — yv; 

i. g(x,y,u,v) =xu/yv; 

j. g(xi,X 2 ,...,x„) =x\x 2 ---x n . 


10.11. Calculate the differential of each of the following /c-forms. 


a. (o(x,y) —ydx — xdy\ 

b. 0)(x,y) = (x 2 — y 2 ) dx — 2 xydy; 

c. co(x,y ) = dx/y — dy/x', 

d. 0 )(x,y,z ) = (y — z)dx + {z — x)dy+ ( x—y)dz ; 

n— 1 

e. ta(xi,...,x„) = ^ (xj-i-x j+ i) dxj; 

j =2 

f. (o(u,v,w) = u 2 dvdw + v 2 dwdu + vr 2 dudv\ 

g. lo(x,y,u,v ) = (ue* — ve y ) dxdy+ (x 2 +}^) dudv 

h. co(x,y,u,v) = sinhttcoshv dxdy + sinxcosy dudv, 

n 

i. a>(qu q 2 , p 2 , ■■■,?„)) = ^Pjdqj- 

j= i 

n _ 

j. a>(x\,X2, ■ ■■ ,x„) = ^ (—1 ) J ~ l Xj dx i • ■ • dxj ■ ■ -dx n ; 

j= i 

n _______ 

k. ft)(xi ,X 2 ,... ,x„) = ^ Xj dx i • • • dxj ■ ■ ■ dx„; 

M 

10.12. Consider the 1-form dg that you obtained in each part of Exercise 10.10. 
Determine its differential, the 2-form drg = d(dg), and confirm d 2 g = 0 in 
each case. 


10.13. Condsider the (k+ l)-form dco that you obtained in part of Exercise 10.11; 
confirm that d 2 (0 = 0 in each case. 
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10.14. Calculate du A dv when 


a. du = dx, dv = 2ydy; 

b. du = cos 0dr — r sin 0 dO, dv = sinOdr + rcosO d6\ 

c. du = 2xdx—2ydy, dv = 2ydx + 2xdy; 

d. du = 3(x 2 — y^)dx— 6xvdy, dv = 6xydx + 3(x 2 —y 2 )dy. 


10.15. For each of the following 1-forms co , first show that dco = 0 and then find a 
function / for which co = df. That is, show CO is the differential of a 0-form 
(or function). 

a. Co(x,y) = xdx + cosydy. 

b. co(x,y)= f(x) dx + g(y) dy. 

c. Co(u,v) =2vdu + 2udv. 

d. Co(x,y,z) = yzdx + zxdy +xydz. 

e. Co(x,y,z) = {y + z)dx+ (z+x)dy + ( x+y)dz 

1 X 

f. (o(x,y) = — dx - ~dv. 

y y z ' 

. . U XU X XU 

g. CO{x,y,u 1 v) = —dx - T~dy-\ - du -^ dv. 

yv y z v yv yv z 

10.16. For each of the following 2-forms a, first show that da = 0 and then find a 
1-form co for which dco = a. 

a. a(x,y) = (x—y)dxAdy. 

b. a(x,y) = cp(x,y)dx Ady. 

c. a(x : y : z) = dxAdy+ dyAdz + dzAdx. 

10.17. Let co = (x 2 +y 2 ) dxdydz. Determine a = P(x,y,z ) dxdy and/3 = Q(x,y,z) dydz 
so that co = da = d[3. 

10.18. Let co = j(— ydx + xdy), a = dco = dxdy , and let (cf. p. 359) 



x = sins' coshf, 
y = coss sinht. 


z 


Determine the pullbacks cp*co and cp*a and confirm that dcp*co = f a. 


(x, y, z) 10.19. (Spherical coordinates). Let 

= (p, q>, 9) 



z 




These equations are similar to the spherical coordinates of Exercise 5.10, 
page 178. The difference is that here cp is co-latitude, measuring the angle 


x 
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down from the positive z-axis. In the earlier exercise, (p was latitude, mea¬ 
suring the angle up from the (x,j)-plane. Here (p, <p, 9) —> (x,y,z) is seen to 
be orientation-preserving. (With (p representing latitude, however, the order 
needs to be (p, 9,tp): in Exercise 5.10, d{x,y,z)/d{p, 9,cp) > 0.) 

a. Determine the Jacobian d(x,y,z)/d(p,cp, 6) as a function of p, (p, and 9. 

b. Determine the differentials dx, dy, and dz in terms of p, cp, 9, and their 
differentials. 

c. Determine the volume element dx A dy A dz in terms of p, <p, 9, and the 
volume element dp /\d(p/\d9. Compare this to the Jacobian you obtained 
in part (a). 

10.20. (Cylindrical coordinates: (r, 9,z)). These replace x and y by polar coordi¬ 
nates while leaving z unchanged: x = r cos 9 , y = r sin 9 , z = z. 

a. Determine the Jacobian d(x,y,z)/d(r,9,z). Given the relation to polar 
coordinates in the plane, is this what you would expect? 

b. Determine the volume element dx A dy A dz in terms of r, 9 , z, and the 
volume element dr f\d9 Adz. Again, is this what you would expect? 

10.21. Determine the pullback a* a where a is the spherical coordinates map of 
Exercise 10.19 and a =xdydz+ydzdx + zdxdy. 

10.22. Let P =xdydz+ydzdx—2zdxdy,and\et 

{ x — a cos u cosh v, 
y = a sin u cosh v, 
x = v. 

Determine the pullbacks f*(dydz), T(dzdx), f (dxdy), and f*(/3). 

10.23. Let S be the surface defined parametrically by x = u + v, y = u — v, z = v, 
where —1 <m< 3, 0<v<2is positively oriented. Determine 

jj ^ xydxdy+yz dy dz+zxdx dy. 

10.24. Let S be parametrized by x = acosw, y = asinu, z = v, where 0 <u< 2n, 
0 < v < h is positively oriented. 

a. Show that S is a cylinder of radius a whose axis is the z-axis. Sketch S, 
showing where the images of the u- and v-axes lie on the cylinder, and 
show how this indicates the orientation of S. 
b. Determine the pullback of the 2-form a = (x 2 +y 2 )dy Adz to the («,v)- 
plane and then determine 
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c. Showthat x,y,z)dxAdy = 0 for any function/(x,y,z). 

10.25. Let S = m(t/) is the oriented surface in R 4 parametrized by 


c. Showthat 



(p=x 2 +y 1 , 

m : \ q=x -y> 


fj. 0<x< 1, 

0 < y < 1. 


r = xy, 

[s = x+y ; 


Let /3 = pqdq Adr + qrdp Ads. 

a. Determine the pullback m* (/3). 

b. Determine 



10.26. Sketch the oriented curve C, 


{x{t),y(t)) = e sm ( f / 2 )(cost,sinf), 0 < t < An, 


and determine its winding number (Definition 10.14, p. 430). 

10.27. Prove the claim 

d(y,z) d{v,w) | d(y,z) d(w,u) | d(y,z) d{u,v) = d(y,z) 
d(v,w) d(s,t) d(w,u) d(s,t) d(u,v) d(s,t ) d(s,t ) 

made in the proof of Lemma 10.1. 

10.28. Prove Theorem 10.16 for differential forms in three variables. 






Chapter 11 

Stokes’ Theorem 


Abstract Stokes’ theorem equates the integral of one expression over a surface to 
the integral of a related expression over the curve that bounds the surface. A similar 
result, called Gauss’s theorem, or the divergence theorem, equates the integral of a 
function over a 3-dimensional region to the integral of a related expression over the 
surface that bounds the region. The similarities are not accidental. Using the lan¬ 
guage of differential forms, we show these two theorems are instances (along with 
Green’s theorem and the fundamental theorem of calculus) of a single theorem that 
connects one integral over a domain to a related one over its boundary. To explore 
the connections, we combine the “modem” approach, using differential forms to 
clarify statements and proofs, with the “classical” appoach, using vector fields to 
understand the individual theorems in the physical terms in which they arose. 


11.1 Divergence 


In this section, we analyze the flux of a continuously differentiable vector field V = 
(P, Q,R) through the boundary dD of a solid region D, in the direction of the normal 
on dD that points out of D. We saw, in an example worked out on pages 420-422, 
that the net flux could be nonzero. In other words, inward flow and outward flow 
need not always balance. 

First take D to be a parallelpiped B whose edges are parallel to the coordinate 
axes (a box). Suppose B is centered at the point (x,y,z) = (a, b , c) and has length Ax, 
width Ay, and height Az. Its boundary dB consists of three pairs of plane parallel 
faces. The face S x+ of dB that lies in the plane x = a + Ax/2 has area AyAz and 
outward normal n A = (1 ,0,0).If the box is sufficiently small, we can approximate V 
everywhere on S x+ by its value at the center (x.y, z) = (a +Ax/2,b, c ) of S x+ . Under 
this assumption, total flux (Definition 10.1, p. 389) through S x + is approximately 

® A+ w V(a + Ax/2,b,c) ■ n x AyAz = P(a + Ax/2, b,c) AyAz. 
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Approximate total flux 
through a pair of faces 


Approximate total flux 
out of the box 


,b,c) 

'.,b,c ) 

The parallel face S x - that lies in the plane x = a — Ax/2 has the same area but 
the opposite outward normal n x . Approximating V everywhere by its value at 
(a — Ax/2,b,c), we can then write 

® A ps Y(a — Ax/2,b,c) ■ (—n x ) AyAz = —P{a — Ax/2,b,c) Ay Az. 

Therefore, 



® T+ + ® Y _ ss (P{a + Aa/2,b,c)—P(a — Ax/2,b,c/)AyAz. 


By the microscope equation, 


dP 

P(a + Ax/2, b,c) — P(a — Ax/2,b,c) ss (a,b,c) Ax 

ox 


when Ax ss 0, so 


d P dP 

O a+ +O a _ ss —(a,b,c)AxAvAz= — (a,b,c)AV, 
ox ' ox 


where AV is the volume of the box B. 

There are similar formulas for the other faces. For the pair S Y ± that lie in the 
planes y = b± Ay/2, the normals are ±n v = (0,±1,0), and we find 

® v+ + Oj,_ w [Q(a,b + Ay/2,c) — Q(a,b — Ay/2,c))AzAx 

?s -Q-(a,b,c)AyAzAx= -^-(a,b,c) AV. 


Similarly, for S : ± we have 

dR 

® z+ +O z _ ps [R(a,b,c + Az/2) — R(a,b,c — Ay/2)) AzAx ps (a,b,c) AV. 


Therefore, we estimate the total flux through dB in the outward direction to be 






AV. 


dx dy dz J (a h / c 


For boxes that are small enough for this formula to provide a good approxima¬ 
tion, we find that total flux is proportional to the volume of the box. It is remarkable 
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that any formula for ® should involve 3-dimensional volume, because ® measures 
flow through 2-dimensional surfaces. The proportionality factor that connects flux 
to volume is the scalar quantity 


dP dQ 

-b — + 

dx dy 


dR 
dz ' 


evaluated at the center of the box. Thus, although total flux of V must certainly 
depend on Y, we find that when the surface is the complete boundary of a small 
box, total flux depends only on a certain scalar, called the divergence, derived from 
the components of V. 


Definition 11.1 The divergence of the vector field V = (P. Q. R) is the scalar field 
(i.e., function) 


„ dP dQ dR 

dlV V — -r - 1 - V -1- r— . 

ox dy dz 


To illustrate, consider our earlier example (pp. 420-422) of the total flux <f> of 
the vector field V = (x +y,y — x, 0) out of the unit cube. Here 


divV = ^-(x+_y)-h-|-(j-x) + ^-0= 1 + 1=2, 
dx dy dz 

a constant. The volume of the unit cube is AV = 1; therefeore we obtain the estimate 
<f> ss 2 x 1 = 2. In fact, we already found <f> = 2 by direct calculation. Even though 
the unit cube is not “small,” the estimate still works well because divY is the same 
at all points. 

The product div V AV in a small box leads us to a triple integral in a larger region. 
On page 309, we introduced the triple integral of a function f(x,y,z) over a region 
D in (x,_y,z)-space that has volume (3-dimensional Jordan content). In particular, if 
f(x,y,z) is bounded and continuous on D, then 


JJf D f( x ’ y ’ z ) dV 

exists (Theorem 8.35, p. 305, adapted from double to triple integrals). Furthermover, 
if 8(D), the diameter of D (Definition 8.14, p. 291) is sufficiently small, then it 
follows from Corollary 8.30, p. 299 that 

f(a,b,c) AV ^ JJJ^f(x,y,z)dV, 


where (a,b,c) is a point in D. Hence, when V is continuously differentiable, so that 
div V is continuous, and B is a box with small diameter. 


divV(a,b,c) AV 



divY dV. 


The divergence of 
a vector field 


From divY AV 
to a triple integral 
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Connecting divV dV 
and V ■ n dA 


Divergence theorem 
for a unit cube 


But because we have just found that div V(a, b. c)AV approximates the total flux 
® through ()B in the outward unit direction n, we can also write 

divV(a,b,c)AF« JJ V-n dA, implying JJJ divY dV ~ JJ V-n dA. 

In fact, we show that these two integrals are actually equal: they both represent the 
total flux. More generally, we show that, for a large class of regions D, the triple 
integral of divV over D equals the surface integral of Y ■ n over dD. This equality is 
called the divergence theorem, or Gauss’s theorem. 


Theorem 11.1. Let B be the unit cube, 0 <x,y,z< 1, and n the outward unit normal 
on SB. Let V be a continuously differentiable vector field defined on an open set 
containing B; then 



divV dV = 


[[ V-n dA 
JJdB 


Proof. To make the proof clearer, we convert the integrands to differential forms. If 
V = (P Q,R), then (cf. pp. 403-404) 


V-n dA= Pdydz + Qdzdx + Rdxdy, 
divV d V = (P x + Q y + R : ) dxdydz. 

Now that the integrands are differential forms, the domains of integration must be 
oriented. Let B have the positive orientation given by the standard basis vectors 
of M 3 in their usual order. Let dB have the orientation induced by B\ this is the 
orientation given by n, the outward unit normal on dB. 

Now we show that 


JJJ P x dxdydz = JJ P dydz. 


A similar approach can be used to show the other two pairs of components are equal, 
thus completing the proof. 

Let us label the faces of dB using the notation from the example on pages 420- 
422. Thus, for example, S x= o is the face (properly oriented) that lies in the plane 
x = 0, and 

dB = S X= 0 + S X =l + Sy= 0 + 5y = l + S z =0 + ^Z=l • 

Because dy = 0 on the faces S v= o and S y= \, and because dz = 0 on the faces S z= o 
and S z= i, we find 


// P dydz = f [ P dydz + // P dydz 
J J dB J J S x — o J J S x= i 

= [ [ -P(0,y,z)dydz+ f f P(\.y,z)dydz 
Jo Jo Jo Jo 

= Jo Jo ^ P ^' y ' z ^~ P< ^ iy ^ dydz - 
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As we saw with similar computations on page 421, to take the orientation of S x= o 
properly into account when we use y and z as parameters, we must include the minus 
sign in the integral of P(0,y,z). 

We can compute the triple integral as a simple (threefold) iterated integral: 




J L ( P ^’ y ’ z ^~ P ^°' y ’ z ^ dydz - 


Thus the surface integral and the triple integral are equal; by the remark made above, 
this completes the proof. □ 

Theorem 11.2. Let B be the unit ball, x 2 Ay 2 + z 2 < 1, and n the outward unit 
normal on dB. Let V be a continuously differentiable vector field defined on an 
open set containing B; then 



divV dV = 



V-n dA. 


Proof. Again we convert the integrands to differential forms, orient the domains 
appropriately, and then show that 


Px dxdvdz = P dvdz; 
JJJb ' JJdB 


similar arguments prove that the other two pairs of components are equal. 

To determine the surface integral, let S+ and S be the graphs of the functions 


S+ :x = +y /1 —y 2 —z 2 and : x = —y/\ —y 2 — z 2 , 


Divergence theorem 
for a unit ball 



defined on the positively oriented disk K : y 2 +z 2 < 1, and inheriting their orien¬ 
tations from K. (In the figure, the surfaces are shown separated for clarity.) The 
orientation normals N+ and N of both surfaces therefore point in the positive x- 
direction. Thus, N+ points outward, but /V points inward, so r)B = S .ST , and 
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Divergence theorem 
for a unit tetrahedron 


z 



[[ Pdydz = [[ Pdvdz— [[ Pdvdz 

JJdB ' JJs + ' JJs- 

= JJ^P(Vl-y 2 -z 2 ,y,z)dydz- JJ_^P(-'Jl-y 2 -z 2 ,y,z)dydz 

= Jj ^ (P(V 1 -y 2 -z 2 ,y,z)-P(-V 1 -y 2 -z 2 ,y,z)) dvdz 

To determine the triple integral, let B be the positively oriented solid region given 
by the inequalties 


B\ 


y 2 +z 1 < 1, 


— \J 1 — y 2 — z 2 < x < yj 1 — y- — z 2 . 


Then 


JJ R P(x,y,z) 


y/l-y 2 -z 2 

=y/l-y 2 -z 2 


JJJgP* dxdydz -fJA J\t7t _ P x (x,y,z)dxj dydz 


c— — \J \ -y 2 —z 


dydz 


= {P(Vl -y 2 ~x 2 ,y,z)~P(-y/l -y 2 -z 2 ,y,z)) dydz , 

so the triple integral is equal to the surface integral. By what has been said above, 
this proves the theorem. □ 


Theorem 11.3. LetB be the unit tetrahedron, 0 < x,y,z, x+y+z< 1, and n the out¬ 
ward unit normal on SB. Let Y be a continuously differentiable vector field defined 
on an open set containing B; then 



divV dV = 



V • n dA. 


Proof. In Exercise 11.8, you are asked to prove this theorem using differential 
forms, following the pattern of the last two proofs. For the sake of illustration, we 
take an alternate approach, integrating the scalar functions divV and V • n directly 
over the unoriented domains B and dlf respectively. If V = (P. Q. R), then 


JJJ b divV dV = jff {P x + Q y + R z )dV 

We convert each term into an iterated integral, describing B by inequalities in three 
different ways, each suited to the term being integrated. To integrate P x , let 


0<T< 1, 

B : 0 < z < 1 — y, 

0<x< 1 —y — z; 
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then 


JJJ/* dV = y [Jl ' Z Px{x,y,z)d^j dzdy 

r 1 P~y 1 

= / / P(x,y,z) 

Jo Jo f 

ri r l ~y 


dzdy 


Jo L ( p ^~y~ z ^y^ z ^ p ^'y^) dzd y- 


By changing the description of B appropriately, we obtain similar expressions for 
the integrals of Q v and Ry. 


JJJ^Q y dV = J J Q {Q{x,l-x-z,z)-Q(x,0,z))dxdz, 

jjj R z dV = J J (R(x,y,l—x—y)—R(x,y,0))dydx. 

(Recall that the order of the differentials here indicates merely the order in which the 
integrations are to be carried out, not the orientation of the domain of integration.) 

In the tetrahedral surface SB, three faces S x= o, S v= o, and S z= o lie in coordinate 
planes; the fourth, Si, lies in the plane x+y+z = 1. Because the outward unit normal 
on S x= o is n = (—1,0,0), we have 

jjv-ndA = jJ(P,Q,R)-(-1,0,0) dA = j' j' ' ~P{0,y,z)dzdy. 

$x=0 $x=0 


In a similar way, n = (0, — 1,0) on S y = o and n = (0,0, — 1) on S z = o, so 


U v '" dA 

Sy =0 

// V "" 

Sz=0 



—Q(x,0,z)dxdz 



On the fourth face, S), the outward unit normal is n 


(v/3’ V3’ i) 


SO 



V- n dA 



P+Q + R 
V3 


dA. 


To integrate the first term, P/ y/3, treat Si as the graph of 


x = 1 —y — z on S x= o : 


0<y< 1, 

0 < z < 1 —y. 


z 



(1/V3, 1/V3, 1/V3) 



x 
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Symbolic form of the 
divergence theorem 


To get an expression of dA, we must calculate (cf. Definition 10.6, p. 409) 


/p(y,z)' 

2 

+ 

d(z,x) 

2 

+ 

'd(x,y)' 

W 

1 0 

2 

+ 

0 -1 

2 

+ 

-1 1 

[d(y,z)_ 


d(y,z) 


_d(y, z ). 

V 

0 1 


1 -1 


-1 0 


= V3. 


Thus dA = \/3 dydz, and 

ff ~7i dA= f / P(i-y-w)dzdy. 
JJs , V3 Jo Jo 


For treat S\ as the graph of >> = 1-x-zon S v= o, and for R/y/3, treat it as 

z = 1 — x — y on S z= o; then 


//,>- 

/i>= 



Q(x, 1 —x — z,z)dxdz, 
R(x.y. 1 —x—y)dydx. 


The triple integral and the surface integral reduce to six iterated double integrals 
each, and these are equal in pairs. □ 

We must still show that the divergence theorem applies to other regions. To do 
this, it is helpful if we think of the integrals in the theorem as symbolic pairings (cf. 
p. 427). Thus if ¥ = (P, Q,R) and 


Images under 
coordinate changes 


¥• n dA = Pdydz + Qdzdx + Rdxdy = a , 
divY dV = ( P x + Q y + R z )dxdydz = da , 


we write 



divF dV = 



da = ( B,da ), 



V-n dA 



Here B is the positively oriented unit cube, unit ball, or unit tetrahedron, and dB is its 
boundary with the induced orientation. In terms of symbolic pairings, the divergence 
theorem thus has the form 

(B,da) = (dB,a). 

Note that when Green’s theorem and the fundamental theorem of calculus are ex¬ 
pressed in terms of symbolic pairings (pp. 427^129), they have exactly the same 
form. The essential point of each theorem is that the exterior derivative d and the 
boundary operator d are adjoints in the symbolic pairing. 

We can now extend the divergence theorem to any region D that is the image, 
under a coordinate change, of a region (such as a unit cube) to which the divergence 
theorem already applies. 
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Theorem 11.4. Let (p.il ■ R ’ be a coordinate change, and let B C Q be a region 
for which the divergence theorem is known to hold. Suppose D = (p(B ), n is the out¬ 
ward unit normal on dD, and V is a continuously differentiable vector field defined 
on <p(Q); then 


[[[ divV dV = [[ V- 
JJJd J J dD 


n dA. 


Proof. We convert to differential forms, as in the earlier proofs. Thus, let a = V ■ 
n dA, da = divV dV. Let B be B with its positive orientation, let dB, D = <p( B), 
and dD = <p(dB) receive the appropriate induced orientations (cf. p. 355), and let 
(p be the pullback of (p on forms (cf. pp. 433ff). Then 



divV dV 


IIIda = ( D,da) 

JJJt> = (, p{B),da) 

definition of D 

= (B,<p*(da)) 

(p and (p* are adjoints 

= (B,d(<p*a)} 

d and (p* commute 

= ( dB.<p*a) 

divergence theorem for B 

= ((p(dB),a) 

(p and (p* are adjoints 

= ( dD,a) 

definition of dD 

-SL a -!L y - 

n dA. □ 


At two key points in the proof, we use the fact that a coordinate change (p and 
its pullback <p* on differential forms are adjoints (Theorem 10.15, p. 436). This is 
just the change of variables formula for integrals expressed in terms of symbolic 
pairings. 

A good example of the image of a cube under a coordinate change in R 3 is the 
region D in (x,y,z)- space between two graphs z = A(x,y ) and z = B(x,y), when A 
and B have a common domain of the form 


a <x < b, 

° : oc(x) <y< P{x). 



z = B{x, y) 


y = P(x) 


X 
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We assume that a and /3 are continuously differentiable on some open interval 7 
containing [a, b], and A and B are likewise continuously differentiable on some open 
set Qo containing Do in the (x,y)- plane. The inequalites defining D allow us to 
transform a triple integral over D into a threefold iterated integral: 

(([ fb / fp(x) / rB(x,y) \ \ 

IIJ D f{X ' y ' z)dV = I (/„ w [Jam f^ dz ) d y) dx - 

The image of a cube Theorem 11.5. Suppose there are continuously differentiable functions a(x), ft (x), 
A(x,y), and B(x,y) for which OCo < <x(x) < fi(x) < j3o for every x in an open inter¬ 
val I, and Aq < A(x,y) < B(x,y ) < Bg for every (x,y) in an open set Qg- Suppose 
[a, b\ C I, and suppose Dq C Qo ts given by 

Dq. a<x<b, (x{x) <y < /3(x). 

If D is the region in (x,y,z)-space defined by 

D : (x,y)€D 0 , A(x,y) <z < B(x,y), 



then D is the image of the unit cube B under a coordinate change. 


Proof. Let Q be the open set in R 3 constructed as the product of Qo in the (*,>')- 
plane and the open interval (Ao,Bq) on the z-axis: Q = Qo x (Aq. Bq). Now define 
(p : Q —> R 3 as 

x — a 


(p : v = 


w = 


y— a(x) 

P(x) — a(x) ’ 
z — A(x,y) 
B(x,y) —A(x,y)' 


Because the components are continuously differentiable and the denominators are 
never zero on Q, (p is well-defined and continuously differentiable. 

If a < x < b, then 0 < u < 1. If, in addition, a M <T< P(x), then 0 < v < 1. 
Finally, if A(x,y) <z< B(x,y) as well, then 0 < w < 1. Thus, (p(D) = B, the unit 
cube. We can solve for x, y, and z to get the inverse: 


{ x = a + (b — a)u, 
y = a(x) + (P{x) - a(x))v, 
z = A(x,y) + (B(x,y)—A(x,v))w, 

understanding that x will be replaced by a + (b — a)u in the fonnula fory, and these 
expressions will then replace x and y in the formula for z. Hence <p 1 is continuously 
differentiable, so (p 1 is a coordinate change for which D= <p ] (B). □ 


Corollary 11.6 The divergence theorem holds for the region D. 


□ 
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The solid region D is the 3-dimensional analogue of the basic 2-dimensional 
region on which we established Green’s theorem (cf. Theorem 9.18, p. 364, and 
pp. 367-368). In the final version of Green’s theorem, we assumed the domain can 
be decomposed into a finite number of nonoverlapping regions on which Green’s 
theorem is known to apply. Our final version of the divergence theorem is similar. 

Theorem 11.7 (Divergence theorem). Suppose V is a continuously differentiable 
vector field defined on an open set Q. in R 3 , and D is a closed bounded subset 
ofQ.. Suppose D\.... .D^ are nonoverlapping regions in R 3 on which the divergence 

theorem applies, and D = D\-\ -+5* when all regions are positively oriented; 

then 


[[[ divV dV = [[ V- 
JJJd JJdD 


n dA. 


Proof. By the additivity of triple integrals, we know immediately that 


JJL* 


di vYdV = 


JjJ divY dV+- 


■ + 


US D f 


divV dV. 


The surface integrals combine in a more interesting way. If two cells D, and Dj 
meet along a face S, then, at any point p on S, the outward normal n, (p) from Dj is 
opposite the outward normal n ; (p) from Dp n ; -(p) = —n,(p). Therefore, 

JJy- n j dA = JJy ■ — n, dA = - JJy- n, dA, 


S as part of dDj 

so the contributions that S makes to 

r • n dA and 


S as part of dDj 


fi 

JJdDj 


jj V-n 
JJdD: 


dA 


exactly cancel. The only contributions that do not cancel are from the faces S that 
dDj shares with dD itself. In those circumstances (and because Dj lies in D), the 
outward normal on S is the same for D t and for D, so 


JJv-ndA = jjY-indA. 


as part of dDj as part of dD 

Therefore, after all the cancellations are taken into account, 


[[ V-n dA = ff V 
JJdD JJdD i 


• n dA- 


ss, ; 


■ n dA. 


By hypothesis, 


[[[ di vYdV= [( V 

JJJDi JJdDj 


dD t 


■ n dA 


for each i = 1 so the proof is complete. 


□ 


The divergence 
theorem on more 
general regions 
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11 Stokes’ Theorem 


The differential forms 
corresponding to a field 


— tt>divV 


= <ad f 


11.2 Circulation and vorticity 

In the previous section we used the connection between a vector field and its di¬ 
vergence, on the one hand, and a corresponding 2-form and its exterior derivative. 
Quite generally, there is a natural way to make a vector field correspond to either a 
1-form or a 2-form, and a scalar field (i.e., a function) to either a 0-form or a 3-form: 

/ «-> (D® = /, 

F = ( A,B,C ) <-> co^- = Adx + Bdy + Cdz , 

V = (P : Q : R) <-> Oy = Pdydz+ Qdzdx + Rdxdy, 
p[ <-> fi)^ = H dxdydz. 

The connection we made between the divergence and the exterior derivative can 
now be viewed in the following light. For the 2-form corresponding to V, 

(By = Pdydz+Qdzdx + Rdxdy, 


we have 

(f(cBy) = (P x + Qy + Rz) dxdydz = fi)J; v y. 

Hence, we can reformulate the divergence theorem as 

JJJ ~®divV = ° r = <a v) • 

for every suitable oriented region B in R 3 . 

Suppose instead that we begin with the 0-form coj corresponding to a function /; 
then 

d((Of) =df = f x dx + f y dy + f z dz= co l gadf: 

where grad f = (f x ,fy,fz) is the gradient vector field off. For any piecewise-smooth 
oriented path C, we have (cf. p. 425) 

J £ ®^rad/ = j- ; grad/ -d\ = Jjf = /(end of C) - /(start of C). 

The right-hand side of this equation is the “0-dimensional integral” of / = 

over dC = end of C — start of C (cf. pp. 428-429). Therefore, we can rewrite the 
equation itself as the symbolic pairing 

(C,co l gradf ) = (dC,co° f ). 

This is, in essence, the fundamental theorem of calculus; compare it to our reformu¬ 
lation of the divergence theorem, above. 
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The gradient and the divergence are differential operators. The gradient takes 
as input a scalar field and produces as output a vector field. The divergence does 
the reverse: the input is a vector field, the output a scalar. We have just noted that 
these two differential operators correspond to the exterior derivative operator on k- 
forms when k = 0 and 2, respectively. What differential operator corresponds to the 
exterior derivative when k = 1, and how is it defined? To answer these questions, we 
begin with the correspondence 

F = (A,B,C) <-> CO^ = Adx + Bdy + Cdz. 

A straightforward calculation (cf. p. 425) then gives 

= (C v — B z )dydz+ (A z — C x )dzdx+ (B x — A v )dxdy. 

This is a 2-form; it corresponds to the new vector field 

¥ ={C y -B z ,A z -C x ,B x -A y ), 

whose components are particular combinations of the derivatives of the components 
of F. For reasons that emerge later, we call V the curl of F. 

Definition 11.2 The curl of the vector fields = (A. B. C) is the vector field 

curlF = (C y - B Z ,A : - C X ,B X - A y ). 

The curl is thus a differential operator whose input and output are both vector fields. 
It completes the trio of operators that correspond to the exterior derivative; we have 

= ©curlF- 

The gradient, divergence, and curl can all be expressed in terms of “nabla,” the 
vector differential operator 


V = 


(d_ d_ d_\ 

\ dx ’ dy 1 dz ) 


introduced on page 93 for two variables and extended here to three. By treating 
nabla as if it were an ordinary vector, we can combine it with scalar and vector fields 
using scalar multiplication and the dot and cross-products. Scalar multiplication (by 
a function placed to the right of nabla) gives the gradient, a vector function: 


V/(x,y,z) 


(d_ l_ d_\ f= (df df df\ 

\ dx ’ dy ’ dz J \ dx ' dy ’ dz ) 


grad/. 


The dot (or scalar) product with a vector field gives the divergence, a scalar func¬ 
tion: 


V-V = 


(l_ d_ d\ 

\dx' dy' dz J 


(P,Q,R) 


dP_ 

dx 


d_Q 

dy 


+ ^ = divY. 
dz 


Corresponding 
differential operators 


curlF 


^(®f) ~ ®curlF- 


nabla 


V/ = grad/ 
V • ¥ = div V 
V x F = curlF 
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Physical meaning 
of curlF 


Example 1: 

F = (—coy, cox, 0) 


y 



Angular velocity vector 


CO 



The cross (or vector) product gives the curl, a vector function: 

x (A,B,C) = 


V x F = 


d_ d_ d_ 

dx ’ d y ’ dz 


d_ d_ 

dy dz 
B C 


d_ d_ 

dz dx 
C A 


(dc 

dB 

dA 

dc 

dB 

dA \ 

\dy 

dz ’ 

dz 

dx ' 

dx 

dy 7 


= curlF. 


d_ 

dx dy 
A B 


What is the physical meaning of curlF when F describes a steady fluid flow? The 
answer to this question is complex and subtle, in part because curlF is itself a vector 
rather than a scalar. To explore the question, we begin by looking at some examples. 

For the first example, take F = (—coy, cox,0), where ft) is a constant (not a dif¬ 
ferential form!). Because the z-component is zero, the field F is everywhere parallel 
to the (x,y)-plane, so fluid in the plane z = constant stays in that plane. The figure 
shows F in one such plane; it suggests that the z-axis is a vortex. We now show, 
more exactly, that the fluid rotates around the z-axis with angular speed co. 

To begin, recall that F (x,y,z) represents the velocity of the fluid at the point 
(x,y,z). Thus, if \(t) = ( x(t) ,y(t) ,z(t )) represents the position of a particle of fluid 
at time t, then its velocity at that point is F(x(t)), so 

x'(f) = F(x(f)), or x' = — coy, y' = cox, z' = 0. 

It follows that x" = (—coy)' = —C 0 2 x, implying that the solutions are sines and 
cosines. The general solution is the three-parameter family 

x(t) =Rcos(cot + cj>), y(t) = Rsin(cot + cj>), z(t) = c; 

the parameters R > 0, </>, and c are the arbitrary constants of integration. These 
equations describe the motion of a fluid particle that is initially (i.e., when / = 0) 
at the point whose cylindrical coordinates are ( R,cj),c ) (cf. Exercise 5.11, p. 178). 
The particle remains in the plane z = c, moves on the circle of radius R centered on 
the z-axis, and makes an angle of 9 = cot + cj> with the positive x-axis at time t. The 
angular speed is O' = co, as we wished to show. 

Any uniform rotation in space (such as we see in this example) is characterized 
by three elements: 

1. Its axis of rotation 

2. Its angular speed 

3. The direction it rotates around its axis 

We can use a vector, the angular velocity vector, co, to represent these three ele¬ 
ments. We take CO to be parallel to the axis of rotation, to have magnitude co = ||fl)j 
equal to the angular speed, and to have the direction that the thumb points when the 
fingers of the right hand curl in the direction of the rotation. Thus, a spinning disk 














11.2 Circulation and vorticity 


463 


with angular velocity © turns in the counterclockwise direction when viewed from 
the side toward which © points. (This is the definition to use when the coordinate 
frame itself is right-handed. If it is left-handed, then we would curl the fingers of the 
left hand to determine the direction of©.) The angular velocity vector captures all 
aspects of a uniform rotation except the location—as distinct from the direction—of 
the axis of rotation in space. 

According to our analysis, the rotation of the flow F at any point on the z-axis is 
given by the angular velocity vector © = (0,0, ©). A quick computation shows that 

curlF = (0,0,2©) = 2©, 

suggesting that curlF essentially represents this uniform rotational motion (with 
|| curlF||/2 equal to the angular speed). But there is a problem. Because curlF is a 
field, it assigns a vector to each point (x,y,z) in space, namely, the constant vector 
(0,0,2©). At any point (0,0,z) on the z-axis, this vector appears to explain the 
rotation we see in the flow. But at no other point is the flow a rotation around that 
point. What is curlF telling us there? 

In fact, the curl does gives us information about rotation at every point, but the 
rotation is not the rotation of the fluid itself. To see what is actually involved, it 
is helpful to study a second flow that lacks the obvious vortex of Example 1. For 
Example 2 we take F = (—ky, 0,0), k > 0. 



The figure shows how F looks in the (x,y)-plane; it looks the same in every 
parallel plane. The fluid flows in straight lines parallel to the x-axis, moving left 
wheny > 0 and right when y < 0. Everywhere on the (x,z)-plane (y = 0), the fluid 
is stationary. The fluid does not rotate. Nevertheless, 

curlF = (0,0, k) 

at every point (x.y. z). If this were an angular velocity, it would represent a coun¬ 
terclockwise rotation with angular speed k /2 around the vertical axis. What, if any¬ 
thing, is rotating? 

Place at the origin a little ball with a rough surface like a tennis ball; use one 
whose density is the same as the fluid’s, so it will have no tendency to float or sink. 
Because the fluid at the origin is motionless, the ball will stay put, but it will not 
remain motionless. The shearing action of the nearby fluid will make it spin in place 
around a vertical axis. The fluid at higher and lower levels (i.e., where z > 0 and 
z < 0) flows the same way as in the (x,y)-plane; therefore that fluid will not alter the 


Angular velocity 
and the curl 


Example 2: 

F = (—Ay, 0,0) 
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The ball spins the same 
way everywhere 


Vorticity 


Quantifying vorticity 



way the ball moves. The net effect is a counterclockwise spin around the vertical; 
only the magnitude of the spin (its angular speed ( 0 ) remains undetermined. The 
angular velocity vector of the little ball is thus a positive multiple of curlF. 

A similar test ball placed anywhere on the x-axis or, for that matter, anywhere in 
the vertical (x,z)-plane should behave the same way. Off the (x,z)-plane, the flow is 
nonzero, and the ball will be carried along by the fluid. But fluid particles at points 
farther from the (x,z)-plane move even faster, so they drag that side of the ball 
forward; particles closer to the (x,z)-plane move more slowly, dragging that side of 
the ball back. The fluid thus has the same shearing effect on the moving ball that it 
does on the stationary one: it spins the ball counterclockwise as it carries it along. So 
curlF describes the rotation of the test ball everywhere, at least qualitatively. Only 
the quantitative link between angular speed and || curlF|| remains undetermined. 



Let us call this tendency of a moving fluid to spin an object that is carried along 
with it the vorticity of the flow. Example 2 suggests that the vorticity of F is caused 
by its shearing action and is decribed by curlF. In fact, by associating the curl with 
a flow’s vorticity instead of its rotation, per se, we can clear up the puzzle of Ex¬ 
ample 1. In that example, fluid farther from the z-axis moves faster than fluid that 
is closer, but here the flow at one level z = constant is the same as at any other. 
Therefore, as a test ball moves with the fluid around the z-axis, it also spins because 
of the shearing action of nearby fluid. At every point, the spin is counterclockwise 
around a vertical axis, a motion described qualitatively by curlF = (0,0, 2co) at that 
point. 

To describe vorticity quantitatively and not just qualitatively, we need some way 
to specify the magnitude of the spin induced by the shearing action of a flow. Return 
to Example 1 and its simple rotational motion around the z-axis. As the fluid moves 
around the circle of radius r centered at the origin in the plane z = c, its (linear) 
speed at any point is cor. If we think of speed as a measure of the “motion” of a 
fluid, then the quantity 


speed x length of path = 2 conr~ 


describes, in some sense, the total motion of the fluid as it travels around that circle. 
Note that this quantity, which we call the circulation , is proportional to the area of 
the circle. In fact, 

circulation Iconr 1 
-=-= 2(0 


area 


Kr¬ 


is exactly the magnitude of the vector curlF = (0,0,2 co). 
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Staying with Example 1, we ask: Can we determine the circulation of the fluid 
around a circle C in some plane z = c but centered at a point away from the origin? 
Along C, the fluid flow is, in general, not longer tangent, so we need to decide how 
to measure the fluid’s “motion.” The figure in the margin suggests we replace the 
flow F by its tangential component Ft. Here t is the unit tangent vector to C in the 
counterclockwise direction. With this choice of t, C becomes the oriented path that 
we denote as C. Because the speed of the flow given by the tangential component is 


||F,||=F-t, 

the “total motion” of this flow around C will be the integral of this scalar quantity 
with respect to arc length: 


circulation = 


W-tds. 


c 


Circulation as 
a path integral 



(The domain of a scalar integral is an unoriented path; cf. Definition 1.6.) On 
page 19 we noted that this scalar integral has the same value as the vector integral 


F ■ dx, 


where C is C provided with the orientation given by the unit tangent t. If we parame¬ 
trize C as (a + rcos/,6 + rsin/,c) with 0 < t < 2 k, and recall that F = {—coy, cox, 0), 
then 


circulation = ® F -dx= f —covdx+ coxdv 
Jc Jc 

r2n 

= cor J ((6+ r sin/) sin/ + (a+ r cos/) cos/) dt 

r2n 

= cor J (bsin/ + acos/ + r) dt = 2conr 2 . 

Thus, for every circle parallel to the (x,y)-plane, we have 

circulation 2 conr 2 

-=-2 = 2co ’ 

area nr A 

circulation per unit area equals the magnitude of the vorticity vector curlF at every 
point. 

Although we have not established that circulation per unit area measures vorticity 
in all cases, let us try it on the flow F of Example 2. This time we calculate the 
circulation around a square instead of a circle, because the calculation reduces to 
a simple product. Let C be the boundary of the square that lies in the plane z = c, 
has its lower-left corner at the point ( a,b,c ), and has sides of lengths parallel to the 
x- and j/-axes. Give C the counterclockwise orientation when seen from above (i.e., 
from where z >c). 


Circulation for 
Example 2 
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The contributions to the circulation from the left and right sides are zero because 
the tangential velocity is zero there. On the bottom side (where jy = b), the tangential 
velocity is —kb, so the contribution is —kbs. (In the figure, b is chosen to be negative, 
so —kb is positive, as shown.) On the top side (where y = b + s), the contribution is 
+k(b + s)s. Therefore, 

circulation = k(b + s)s — kbs = ks 2 . 


The area of the square is s 2 , and the vorticity vector is curlF = (0,0, &), so we find 
once again that the circulation per unit area equals the magnitude of the vorticity 
vector. 

Staying with Example 2, let us determine circulation per unit area when the path 
is a circle instead of a square. We take the same oriented circle C we used for Ex¬ 
ample 1, 

(. x,y,z ) = (fl + rcost,h + rsint,c), 0 < t < 2 k. 

With the flow field F = {—ky, 0,0), we have 


circulation = 


j>—kydx = J (krb sin t + kr 2 sin 2 1) dt = knr 2 . 


Circulation of a flow 


For these paths, at least, circulation per unit area also equals k. 

Definition 11.3 The circulation of the flow F around the oriented closed loop C is 
the path integral 

circulation o/F around C = ■ dx. 


Circulation around 
other curves 


Circles with 
arbitrary orientation 


In our two examples, vorticity was constant in both magnitude and direction, and 
we calculated the circulation only around curves lying in planes perpendicular to the 
vorticity vector curlF. Suppose we take an arbitrary plane; how does the circulation 
around a curve in that plane depend on the orientation of the plane? 


• (x - a) = 0 


We can parametrize the oriented circle C of radius r that is centered at the point a 
and lies in the plane with unit normal n by choosing two perpendicular unit vectors 
ui and U 2 for which m x U 2 = n and setting 
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\(t) = a + (rcost)ui + (r sin/)u 2 , 0 <t < 2 n. 


Let a = (a,b,c), ui = (oti,jSi, yi), and 112 = (cc 2 ,p 2 ,Y 2 )', then the circulation of F = 
(—ky, 0,0) (Example 2) around C is 


j) —kydx = j — k(b + rf$\ cost + rj5 2 smt)(—ra\ sint + ra 2 cost) 

rln 

= kr 2 (aip 2 sin 2 t — [5\aT cos 2 t)dt 

Jo 


dt 


= kr 2 (a\f$ 2 n 


p\a 2 n) = nr 2 k 


ai 

«2 


Pi 

P 2 


(Of the six terms in the second integral, only the two that make a nonzero contribu¬ 
tion have been carried over to the third integral.) Because 


n= (ai,/3i,yi) x (a 2 ,j3 2 ,72) 


f Pi n 

V p2 72 


n 

72 


ai 

a2 



Pi 

P2 


and curlF = (0,0, A:), we can express the circulation of F around C as 

(area inside C) curlF • n. 

For this example, at least, we see how the orientation of the plane containing C 
affects the value of the circulation. It says that if the unit normal n makes an angle 
of 6 radians with curlF, then 

circulation of F around C .. , rr , M 

---= || curlF11 cosO. 

area inside C 

For example, suppose the test ball in Example 2 is constrained to rotate around 
an axis parallel to n. If n is vertical (i.e., 9 = 0), the ball will spin as before. How¬ 
ever, as we increase 9 and tilt n away from the vertical, the shearing effect of the 
nearby fluid—and with it the ball’s rate of spin—will decrease. As the axis becomes 
horizontal, the rate of spin will approach zero. 

The connection between vorticity and circulation that we have noted in the exam¬ 
ples holds, in fact, quite generally. To generalize, let us first replace the orthogonal 
vectors r ui and rn 2 in the parametrization of the circle C above by arbitrary linearly 
independent vectors vi andv 2 i 


\(t) = a + (cosf)vi + (sint)v 2 , 0 < t < 2n. 

The image is an oriented ellipse E whose area is n \ \ vi x V 2 11. To see this, consider the 
linear map L of the plane containing C to the plane containing E given by Z. (ru, ) = 
v„ i = 1,2. 



Parametrizing an 
ellipse in any plane 
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Flows and maps 



Then E is an ellipse because it is the image of a circle under a linear map. Further¬ 
more, because L maps the square /'U| A / U2 to the parallelogram vi A V2, 


|area magnification factor for L\ 


area(vi A V2) 
area(rui A 7-112) 


llvi x V2|| 

r 2 


Finally, because E = L(C ) and C has area nr 2 , the area of E is tt||vi x V21|. 

To continue with our generalization of the connection between vorticity and cir¬ 
culation, we note that a flow field F = ( A,B,C ) defined on an open region £2 in R 3 
can be considered a map F : Q. —> R 3 , 


Limiting value of 
circulation/area 


{ u=A(x,y,z), 
v = B(x,y,z), 
w = C(x,y,z). 

In particular, if the map F is differentiable on Q, then we have 

F(a +Ax) = F(a) +dF a (Ax) +o(Ax) as Ax —> 0, 


for any point a in Q (Definition 4.6, p. 129). The following theorem now considers 
the circulation of F around the family of ellipses 

E e : x(t) = a + (ecost)vi + (esinf)v 2 , 0 < t < In. 

Note that areaF e = 7T£ 2 ||vi x V2 1|. In our examples, the ratio of circulation to area 
had a constant value, namely, curlF ■ n. Flere the ratio is no longer constant; never¬ 
theless, its limiting value equals curlF • n as the area approaches zero. 

Theorem 11.8. IfF : Q. —>■ R 3 is a differentiable flow field, then 


circulation of F aroundE e 

Inn----- 

area inside E e 


= curlF - n, 


where n is the unit normal in the plane containing the ellipses E e . 

Proof. The proof follows from a sequence of lemmas. 

Lemma 11.1. F • dx = ne 2 [dF a (vi) ■ V 2 - dF a (v 2 ) • vi] +o(e 2 ) as £ —> 0. 
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® F-dx = / F(a + ecos/vi+esin/V 2 ) • (—esintvi+ecos/V 2 )A. 

Jg e Jo 

Set Ax = ecostvi +esintV 2 . Because Ax —>■ 0 as e —> 0, we have o(Ax) = o(e). 
Therefore, using the differentiblity of F, we can write the integral as 



(F(a) + ecostdF a (vi) + esin/dF a (v 2 ) + o(e)) • (—esintvi + ecost\ 2 )dt. 


When we expand the dot product, we get eight terms. Four have average value zero 
and therefore vanish when integrated; the others combine to give 


<f F • dx = 

Je c 



£ 2 C0S 2 ?dF a (vi)-V2 


e 2 sin 2 tdF a (v 2 ) • vi + o{e 2 ))dt 


= ne 2 [dF a (vi) • V 2 - dF a (v 2 ) • vi] + o(£ 2 ) as £ —* 0. 


□ 


We find that the expression i//(vi, V 2 ) = dF a (vi) • V 2 — dF a (v 2 ) • vi is the key to 
relating the circulation of F to curlF. 

Lemma 11.2. There is a unique vector q for which 

V(vi,V2) = dF a (vi)-V2-dF a (v 2 )-vi = q-(vi x v 2 ), 

for all vectors V|, V2 in R 3 . 

Proof. For any vector q, define the function 

Tq(vi,v 2 ) = q- (vi X v 2 ). 

Then y/ and T q are both bilinear and antisymmetric (cf. p. 62); that is, they are linear 
functions of each of their inputs and, furthrmore, 

V(V2,Vi) = -y(vi,V 2 ), T q (v 2 ,Vi) = -Tq(v 1 ,v 2 ), 

for all pairs V| , V2. Therefore, if y/ and r q agree on a basis for R 3 , they must agree 
everywhere (cf. Exercise 11.11). 

Set q = (q\,q 2 ,qf where 

qi = t//(e 2 ,e 3 ), q2 = Vfa,ei), q 3 = V(e\,e 2 ), 

and { e 1 , e 2 , e 3 } is the standard basis for R 3 . Then 

Tq(e2,e3) = q-(e 2 xe3) = q-ei = q\ = y/(e 2: e 3 ), 

T q (e 3 ,ei) = q- (e 3 x ei) = q e 2 =q 2 = t//(e 3 ,ei), 

T q (ei,e 2 ) = q- (ei x e 2 ) = q e 3 =q 3 = t//(ei,e 2 ); 


V / ‘(vi,v 2 ) 
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Wm = W 


Antisymmetric 
part of dF a 


by antisymmetry, we find T q (e,-,ey) = t/r(e,,e y ) for all i,j = 1,2,3. □ 

Any square matrix M can be written uniquely as the sum of a symmetric and an 
antisymmetric matrix. Set 

A = i(M+M t ), 

where is the transpose of M. Then = S, FL^ = —FL, and M = S + FL. 

Lemma 11.3. For any 3x3 matrix M, 


if/M = My i ■ v 2 - My 2 ■ Y\ = FLy\ ■ v 2 - FLy 2 • vi = t 
where FL is the antisymmetric part ofM: FL = \{M — Af). 

Proof. Write M = S + FL, where S = j(M + M^); then it is easy to see that t \Tm = 
\j/ s + y/fj. Now write the dot products as matrix multiplications using column vectors 
and their transposes: 

V's = (5vi ) f v 2 - (5v 2 ) f vi = v{5 t v 2 - y\S 1 ' V! = v{5 t v 2 - 5vj. 


In the last term we used the symmetry of S. Each term is a scalar (i.e., a 1 x 1 
matrix), so is equal to its own transpose. Thus we can write 

vj'5 t v 2 = (vj^ 1 v 2 ) T = v|5vi, 


from which it follows that \f/ s = 0 and hence t jfM = VA?- □ 

For the map F with components ( A,B,C ), the derivative and its antisymmetric 
part are 


dF a 


(A x 

Ay 

A =) 

,( 

B x 

By 

B ‘ 

• * = 2 

\c x 

Cy 

c--) 

V 


-7 P\ 

0 -a , 
a 0 ) 


where 

a = C y -B : , p=A z -C x , y = B x -A y 


and all components are evaluated at x = a. Labels are chosen for the components of 
FL so that (a,/3,y) = curlF(a). With FL we can now determine the vorticity vector 
q that is provided by Lemma 11.2. 


Lemma 11.4. dF a (vi) ■ v 2 - dF a (v 2 ) ■ vi = curlF(a) • (vi x v 2 ). 
Proof. Because i/tdF a = Vx, we have 


j-y\ (o\ (P\ M 

q\ =if/^(e 2 .e 3 ) = FLe 2 ■ e 3 - FLe 3 ■ e 2 = \ I 0 l loj-i|-aj lll=a. 


In a similar way, you can show q 2 = /3, q 2 = y. 


□ 
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To complete the proof of Theorem 11.8, use Lemma 11.1 and Lemma 11.4 to 
write 


circulation of F aroundF e 7re 2 [dF a (vi) -V 2 —dF a (v 2 ) • vj] +o(e 2 ) 


area inside E F 


= curlF • 


||V! x v 2 || 

Vi X v 2 o(e 2 ) 


= curlF- n + 


Vi X v 2 || 

0 (£ 2 ) 


In the second term, the divisor tt||vi x V 2 1| has been absorbed into o(£ 2 ). The theo¬ 
rem then follows because, by definition, o(e 2 )/e 2 —> 0 as £ —> 0. □ 


In the proof of Lemma 11.2, there is no motivation (other than hindsight) for the 
formula 

q = (<Ke 2 ,e 3 ),<Ke 3 ,ei),tKei,e 2 )) 

that expresses the components of the vorticity vector q in terms of particular circu¬ 
lation calculations. According to this equation, the x-component of vorticity comes 
from circulation in a plane normal to the x-axis (i.e., a plane parallel to the vectors 
e 2 and e 3 that determine the (y,z)-plane). Similarly, the y-component of vorticity q 
uses a plane normal to the _v-axis, and the z-component of q uses a plane normal 
to the z-axis. Let us now reconstruct q by calculating anew the circulation in those 
planes. The work will look similar to the initial work (pp. 449-451) that led to our 
identifying the divergence of a field. 

It is convenient to calculate the circulation around the boundary of a rectangle— 
as we did in Example 2—rather than around an ellipse. On a rectangle whose sides 
are parallel to the axes, the tangential component of F on a side is just one of the 
Cartesian components of F. We begin with the oriented rectangle R x centered at the 
point a = (a, b, c ) and lying in the plane x = a (and thus parallel to the (y,z)-plane). 
Let the lengths of its sides be denoted Ay and Az, and orient it counterclockwise 
when viewed from the side where x > a. Its orientation normal is then n = (1,0,0). 



On the bottom edge (where z = c — Az/2), the tangential component of F = 
(A,B,C) is B(a,y,c — Az/2). If we approximate B everywhere on this edge by its 
value at the center, (a,b,c — Az/2), then the contribution this edge makes to the 


Reconstructing the 
vorticity vector q 
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circulation of F around dR x is approximately 

B(a, b,c — Az/2) Ay. 

Along the top edge, the tangential component of F is — B(a,y,c + Az/2). The minus 
sign is needed because the unit tangent to dR x on this edge is t = (0, —1,0). If 
we approximate —B everywhere by its value — B(a,b,c + Az/2) at the center, the 
approximate contribution this edge makes to the circulation is 

— B(a, b 1 c + Az/2) Ay. 

If we put these together and use the microscope equation, we have 

—B(a,b,c + Az/2) Ay + B{a,b,c — Az/2) Ay 

B(a,b,c +Az/2) — B(a,b,c — Az/2) 

=-^-AzAy 

« —B z (a,b,c) AyAz. 

For the right and left edges there is a similar result: together they contribute approx¬ 
imately 


C(a,b + Ay/2,c)Az — C{a,b — Ay/2)Az 

_ C(a,b + Ay/2,c) — C(a,b — Ay/2,c) ^ ^ 
Ay y 

ss C y (a,b,c) AyAz. 


Thus we can write 


circulation of F around dR x ~ (C v (a) — B : { a)) area R x . 



The factor C y — B- is indeed the x-component of curlF. 

For a similar rectangle R y in the plane y = b parallel to the (z,x)-plane, the 
tangential components of F at the centers of the sides parallel to the z-axis are 
C(a — Ax/2,b,c) and —C(a + Ax/2,b,c). Their contribution to the circulation is 
approximately 

—C(a + Ax/2,b,c)Az + C(a — Ax/2,b,c)Az 

C(a + Ax/2, b,c) — C(a — Ax/2, b,c) 

Ax 

« —C x (a,b,c) AzAx. 

The tangential components of F at the centers of the remaining two sides are 
A(a,b,c + Az/2) and —A(a,b,c — Az/2), making together the approximate contri¬ 
bution 
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A(a,b,c-\-Az/2)Ax — A(a 7 b 7 c — Az/2)Ax 

A(a,b,c + Az/2) — A(a,b,c — Az/2) 
Az 


A z (a 1 b 1 c) AzAx 


to the circulation around dR v . Thus 

circulation of F around dR y « (A z ( a) — C x ( a)) area^; 

the factor A z — C x is the y-component of curlF. A similar analysis of an oriented 
rectangle R z in the plane z = c leads to the result 

circulation of F around dR z « (B x (a) — A v (a)) area R : ; 

the factor B x — A y gives the final component of curlF. 

Thus, the circulation around any one of the rectangles R is approximately 
curlF(a) ■ n area R. But this expression is itself an approximation to the flux of the 
vector field curlF through the small rectangle R (Definition 10.1, p. 388). Hence, 

circulation of F around dR s=s flux of curlF through .R. 

Even more is true. In the next section, we show that, for an oriented surface S with 
boundary dS, the circulation of the vector field F around dS is equal to the flux of 
its vorticity field curlF through S: 

circulation of F around dS = flux of curlF through S 

(f F tds = ff curlF ndA. 

Jds JJs 

This is the physical content of Stokes’ theorem. It also gives us a new way to look at 
vorticity. Instead of having to rely on the physical image of a little ball set spinning 
by the shear action of a fluid, we can use the mathematical notion of the circulation 
of that fluid in various planes. Note that the circulation/flux identity involves two 
distinct flows: F and its vorticity curlF. 

Definition 11.4 Let the continuously differentiable vector field F represent a steady 
fluid flow field; then the vorticity flow field, or the vortex flow field, of F is repre¬ 
sented by the vector field V = curlF. 

We attribute the vorticity of a fluid flow—as given by the curl—to shearing forces 
induced by the flow. We attribute these forces, in turn, to the relative motions of 
nearby fluid particles. Let us now analyze the relative motions; as we show in The¬ 
orem 11.10, they explain the divergence as well as the curl. 

The velocity field F determines the overall motion of the fluid; it must, therefore, 
determine the relative motion as well. To explore this connection and to clarify the 
distinction between the two kinds of motion, we work through a third example: 

F = (A^y 2 ,0,0), k > 0. 


Circulation and vorticity 


The vorticity flow 
of a flow 


The relative motions of 
nearby fluid particles 


Example 3: 

F=(*y 2 ,0,0) 
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X 


x — ky- 


The flow maps h, 


Describing the 
relative flow 


Think of this as a nonlinear modification of Example 2; the nonlinearity will help 
us see the relative motion more clearly. The fluid flows in straight lines parallel to 
the x-axis. In the (x,y)-plane, the speed of a particle is proportional to the square of 
its distance from the x-axis; in any parallel plane, the flow looks the same. 

As in the previous examples, the position \(t) = (x(t),v(t),z(t)) of a particle at 
time t is determined by the differential equations 

x'(t) = F(x(/)), that is, x' = ky 2 , / = 0, 2 /= 0. 

The solutions are 

x(t) = a + kb 2 t , y(t) = b, z(t) = c; 

a, b, and c are arbitrary constants of integration. These equations parametrize the 
path of the particle that is initially (i.e., when t = 0) at the point a = (a. b, c). The 
particle does not move if b = 0; therefore we assume b ^ 0 for the rest of the dis¬ 
cussion. 

More generally, then, the equations say that the particle at the point x = (x,y,z) at 
time t = 0 is at the point h,(x) = (x + ky^t^y^z) at an arbitrary (earlier or later) time t. 
In other words, the flow defines, and is defined by, the family of maps h,: R 3 —> R 3 : 



u =x + ky 2 t , 

/I 

2 kyt 

°\ 

h, : < 

A 

II 

u 

d(h,) x = 0 

\ 

° 


w = z: 

K 

v° 

0 

1/ 


Note that h v oh, = h i+ , for every s.t. This implies each h, is invertible, because 
h , o h, = hi, = identity. 



To describe the relative motion of fluid particles near the particle that is intially at 
the point a, choose a small cube W (a “window”) centered at a, and let h, (If 7 ) denote 
the image of W under the flow h, (t can take negative as well as positive values). If 
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x is a point in W, and Ax = x — a indicates the position of x in relation to a, then the 
vector 

Ax(f) = h?(x) — h;(a) 

in h t {W) indicates how the relative position Ax varies over time. When the vectors 
Ax(t) (for t near 0) are translated to a common point (at the origin of a window W* 
with local coordinates Ax = (Ax, Ay. Az)), they exhibit the relative flow we seek to 
describe. With these local coordinates in mind, we rewrite h,(x) as 

h,(a + Ax) = (a + Ax + k(b + Ay) 2 f, b + Ay,c + Az), 

which shows how A x(t) = h,(x) — h,(a) can be described by the equations 

! A x(t) = Ax + t {2kbAy + k(A v) 2 ), 

Ay{t) = Ay, 

Az(t) = Az. 

From one point of view, these equations parametrize a straight line that is parallel 
to the Ax-axis and passes through the arbitrary, but fixed, point Ax = (Ax, Ay, Az) 
in W*. This straight line is one of the relative flow lines we see in W*, above. From 
a second point of view (in which Ax is variable, not fixed), these equations define the 
family of relative-flow maps h* : W* —> R 3 . When the center of the original window 
W lies off the (z,x)-plane (i.e., when b f 0), the relative flow described by h* is 
along paths that head in opposite directions on opposite sides of the Ax-axis. 

The relative flow has its own velocity field; let us call it F*. The field vector F* 
at the point Ax = (Ax, Ay, Az) is, by definition, the velocity of the path Ax(t) at its 
intitial point: 


F*(Ax,Ay,Az) = — Ax(t) 
at 


= (2 kb Ay + k{Ay) 2 ,0,0) 


(=0 


Alternatively, the relative flow field is the map F* : W* 


Au = 2kb Ay + k{Ay) 2 


F* : { Av = 0, 
Aw = 0. 


Because we assume W* is small, k(Ay) 2 is negligible in comparison to 2kbAy (recall 
that b f 0), so F* is well approximated by its linear part, 


/0 2 kb 0\ /Ax\ 
o 0 0 Ay , 
\0 0 0/ \Az) 


The relative-flow 
maps h,* 


The relative-flow 
field F* 



Ax = 2kbAy + k(Ay) 2 


which is just dF a (Ax). This gives us the following links between the relative flow 
F* in W* and the overall flow F. 







Splitting dF a to describe 
its action 


The action of I+tS 
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F*(Ax) = dF a (Ax) + o(Ax), 
h*(Ax) = Ax + idF a (Ax)+io(Ax). 

Because we can make W* arbitrarily small, we can neglect the higher-order terms 
o(Ax) in analyzing the relative flow, so that 

h;«d(h;) 0 = /+tdF a . 

Hence, we look to the action of dF a to explain the relative flow. First split dF a 
into its symmetric and antisymmetric parts (cf. p. 470); thus, dF a = S + AL, where 

/0 kb 0\ / 0 kb 0\ 

S=\kb 0 0 , Jl=[-kb 0 0 
\0 0 0 / \ 0 0 0 / 

Because S is symmetric, it is a pure strain (cf. Theorem 2.6, p. 40): it has three 
mutually perpendicular eigenvectors with real eigenvalues, as follows: 

Ai = kb , A 2 = —kb, A3 = 0, 

/e/V2\ f-e/V2\ 

bi = ( e/V2J , b 2 = I e/x/2 1 , b 3 = 

We take e > 0, so each eigenvector has length e. As it happens, b 3 is also an eigen¬ 
vector for JT (with the same eigenvalue, 0). Furthermore, JT maps the plane deter¬ 
mined by the other two eigenvectors to itself, because 

Slb\ = —kb b 2 and 7Tb 2 = kbb\. 

Therefore, the map c/(h*)o = I +1 (S + ft) acts in a simple way on the basis 
{bi,b 2 ,b 3 } forR 3 : 



bi —> bi +kbt bi —kbt b 2 , 
b 2 —>■ b 2 — kbt b 2 + kbt bi , 
b 3 —> b 3 . 

Let us see how the cube K = bi A b 2 A b 3 “flows” under this transformation. 

First remove JT in order to isolate the action of the symmetric component A: 

bi —> bi +kbt bi, b 2 ^ b 2 — kbtbj, b 3 —>b 3 . 

The vertical edge b 3 is left unchanged, so we concentrate on what happens to the 
base square Q = bi A b 2 in the (Ax, Ay)-plane. When t = 0, we have the original 
square Q\ as t changes, one edge of Q grows longer and the other grows shorter at the 
same rate, yielding a rectangle Qi whose sides are parallel to the square. (The white 
arrows in the figure are kbt bi and —kbt b 2 .) Because we are interested in t —> 0, 
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we can assume \kbt\ <C I, so the deformation Q —> (7, is small. The original cube K 
becomes a rectangular parallelepiped K t (a “brick”) whose base is the rectangle Q,. 
In effect, 7 + tS is a strain that changes the shape of the cube, and also its volume: 

vol K t = e(l +kbt ) x e(l — kbt) x e = e 3 (l — ( kbt ) 2 ). 

To first order, t has no effect on the volume. More precisely, the change in volume 
is AvolAT = vol K, — vol/f = — ( kbt) 2 e 3 , so the relative change (i.e., the change in 
volume as a fraction of the volume itself) is the function 

V(t) = = - (kbt) 2 , for which F'(0) = 0. 

vol/C 

It turns out that V'(0) = 0 is a consequence of the particular nature of S and thus, 
ultimately, of F. We show that, for a general flow F, the relative growth rate V'(0) 
of volume is divF. Note that, in our example, divF = 0. 

Now remove S from I + t(S + X) in order to isolate the action of the antisym¬ 
metric component fit: 

bi —> bi — kbtb2, b2 —> b2 +kbtb\, b3—>b3. 

Again it is sufficient to see what happens to the base square Q in the (Ax, Ay)-plane. 
This time the small changes (the black arrows in the figure) are perpendicular to 
the sides; the effect is to turn the square, rather than to strain it. Nevertheless, Q t is 
slightly larger than Q when t f 0, but t again has only a second-order effect on the 
volume of K t : 


vol K t = e\J 1 + (kbt) 2 x e1 + {kbt) 2 x e = e 3 + 0[t 2 ) as t —> 0. 

We show that, in the general case, I + tJi will continue to have only a second-order 
effect on volume. To first order, I +1 J? is a uniform rotation. 

Lemma 11.5. To first order in t, the flow 


/ 1 kbt 0\ 
l + tTL= ( -kbt 1 0 

l 0 0 1/ 

is a uniform rotation with angular velocity (0 = (0,0, —kb), that is, a uniform rota¬ 
tion around the positive Az-axis with angular speed —kb. 

Proof. In (x,y,z)-space, uniform rotation with angular speed CO around the positive 
z-axis is given by the matrix function 

( cos cot — sinait 0\ 
sin cot cos cot 0 . 

0 0 \) 


the action of I+tSk 
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Combining the rotation 
and the strain 


The curl describes 
local rotation 


The relative flow 
for an arbitrary F 


The Taylor approximations 


cos cot = 1 + 0(t 2 ) 

imply 



and sin cot = cot + 0(t 2 ) ast^O 


-cot 0\ 

1 0 I +0(t 2 ) as t > 0. 

0 l) 


□ 


To first order in t, tS induces no rotation and t!A induces no strain. Thus, to first 
order in t and in a sufficiently small window W*, the relative flow 

h* Kl + tS + tA 

rotates the cube K around the positive Az-axis with angular speed —kb while altering 
the lengths of its sides at the rates kb, —kb, and 0. 

For the flow F = (ky 2 , 0,0) of our example, curlF(a) = (0,0,—2 kb) = 2ft). We 
originally interpreted the curl as describing the tendency of a flow to spin a small 
object carried along with it. Example 3 suggests a new interpretation, in which the 
curl directly describes the local rotation of a small blob of the fluid itself under the 
action of the relative flow h*. 

With the decomposition <iF a = S + 3T, we were able to focus separately on the 
two distinct aspects of the relative flow: strain and rotation. In fact, these aspects act 
together, and together they should produce the shear flow in W* that we observed 
at the outset. The following figure confirms this. Each gray arrow, as the sum of a 
white and a black arrow, expresses the action of I + tS + tSl. 



Using Example 3 as a guide, we now take up the general case of an arbitrary 
continuously differentiable velocity field F = (A.B.C) defined on an open set £2 
in R 3 . The corresponding flow h, : Q. —> R 3 is defined by h,(x) = x(t), where x(t) 
is the unique solution of the initial-value problem 

x'(t) = F(x(f)), x(0)=x. 

To describe the relative flow of fluid particles near the particle that is initially at the 
point a, let If be a small cube centered at a, and let x be an arbitrary point in W. 
Then Ax = x a gives the position of x in relation to a, and the vector 



11.2 Circulation and vorticity 


479 


Ax(7) = h,(x) - h,(a) = h,(a +Ax) — h,(a) = h* (Ax) 

describes how that relative position varies over time. To get a formula that ties h* 
back to F, we first use Taylor’s theorem to write 

h,(x) = x(f) = x(0) + 7x'(0) +0(t 2 ) = x + ?F(x) +0{t 2 ) as t —> 0. 

Note that x'(0) = F(x(0)) = F(x) and that x(t) has the continuous second deriva¬ 
tive required by Taylor’s theorem, because \'(t) = F(x(f)) and F is continuously 
differentiable. Because x = a + Ax, it follows that 

h,(a + Ax) = a + Ax + tF(a + Ax) +0(t 2 ) 
and h,(a) = a + /F(a) +0(t 2 ) 

as t —> 0, and hence (using the differentiability of F at a) that 

h*(Ax) = h,(a + Ax) — h,(a) = Ax + t(F(a +Ax) — F(a)) +0(t 2 ) 

= Ax + t(dF a (Ax) + o(Ax)) + 0(t 2 ) 


as Ax —► 0 and t —> 0. 

As we did in Example 3, we consider that this formula for the relative flow defines 
a family of maps 

h; : W* ->■ M 3 

of a window W* centered at the origin of local coordinates Ax. Compare the general 
formula for h* that we have just obtained with the earlier one in Example 3 (p. 475); 
the only new ingredient is the higher-order term 0(t 2 ). The derviative dF a is still 
the key to the relative flow. The velocity field F* of the relative flow is 


F*(Ax) = — h*(Ax) 


= dF a (Ax)+o(Ax) asAx— >0. 


t =o 


Furthermore, when W* is small enough for us to ignore higher-order terms in Ax, 
we can approximate h* by its derivative at the origin Ax = 0: 

h,* «d(h;) 0 =/ + tdF a + O(f 2 ) as 7^0. 


Let us now use the approximation d(h*)# to get an idea how h* affects volumes 
near Ax = 0. We expect that the volume of a small cube of fluid may change as the 
fluid flows; we want to measure that change. As before, it is helpful to split dF a into 
its symmetric and antisymmetric parts: dF a = S + 3T- The symmetric matrix S has 
mutually orthogonal eigenvectors bj, b 2 , b 3 of length e, with eigenvalues Ai, A 2 , A 3 , 
respectively. Choose an order for the eigenvectors so that the cube K = bi A b 2 A b 3 
has positive volume e 3 . Its image K t = (I+tS)(K) is the rectangular parallelepiped 
with 


The relative flow 
and dJF a 


How d(h*) 0 
affects volumes 
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11 Stokes’ Theorem 


How h* itself 
affects volume 


vol K, = e(l + tX i) x e(l +tX 2 ) x e(l -KA 3 ) 

= e 3 (l+/(Ai + X 2 + A 3 ) + 0(t 2 )) ast—> 0 . 

Because the sum of the eigenvalues of a matrix equals the trace of that matrix, and 
because the diagonal elements of an antisymmetric matrix are all 0 , we have 


At + X 2 + A 3 = tr J> = tr(J> + J?) = trdF a . 


Furthermore, because F = (A,B,C), 


trdF a 


d_A 

dx 


(a) 


dB 

dy 


(a)- 


dC 

dz 


(a) = divF(a), 


finally linking volume change to divergence: 


vol K t = vol/f(l +fdivF(a) + 0(t 2 )) as t —> 0. 


Thus, the relative (or percentage) change in volume (p. 477) is 

= vol K, - volTf = f divF ( a ) + 0( j 2) as f 0 ; 
vol A 

consequently, V'(0) = divF(a). Note that we obtained this result under the assump¬ 
tion we could replace the relative flow by its linear approximation and we could 
restrict ourselves to cubes whose edges were the eigenvectors of the symmetric part 
of the linear approximation. 

We now remove the restrictive assumptions and determine F'(0) using the orig¬ 
inal nonlinear map h, itself. First, let K range over closed sets with volume that 
contain the point a (so Ax = 0). Let K t = h, (K)\ its volume is 

volh *(K) = JJJ/hfdV 

(J h * is the Jacobian of h*). Because h* is nonlinear, the ratio 

volh,*(Z)-volA: 

volAi 

is no longer independent of the diameter 8K of K (Definition 8.14, p. 291). How¬ 
ever, to determine percentage growth of volume, it is sufficient to see what happens 
to this ratio as 8K —► 0. Because volh*(if) is a set function of integral type (cf. 
pp. 310-312), we can calculate its derivative as K shrinks down to the point a; by 
Theorem 8.39, (p. 312), we find 


lim 

SK^O 


volh *(K) 
vol K 


= - 4 *( 0 ). 


This allows us to define the percentage change of volume as the limit 
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no 


lim 

8K~iO 


vo\h*(K)-vo\K 
vol K 


= A;(0)-l. 


Because 

J h *(0) = dett/(h*)o = 1 + fdivF(a) + 0(t 2 ) as / —> 0, 

we again have 

V(t) = tdivF(a) + 0(t 2 ) ast^O, 

and V'{0) = divF(a). 

We now consider the action of the flow I+tSi, where Si is the antisymmetric part Action of I + tA 

of d\ F a . As we noted on page 470, when F = (A,B,C), then 

1 ( 0 -7 P \ M (Cy~B z \ 

Si = - I y 0 —a , where Z = I /3 1 = I A z — C x I , 

V-J3 « 0 j \yJ \B X —Ay) 

and all functions are evaluated at x = a. 

Lemma 11.6. The vector Z is an eigenvector of Si with eigenvalue 0. □ 

The following theorem is the analogue, for a general flow, of Lemma 11.5 for the 
flow of Example 3. Note thatZ= curlF(a). 

Theorem 11.9. To first order in t, the flow I+tSA is a uniform rotation with angular 
velocity \Z. 

Proof. By definition, a rotation matrix R is an orthogonal matrix (i.e., one whose 
transpose equals its inverse: R^R = I) with positive determinant. Let R, = I + tSA. 

Then RJ =I — tSA and (R t )^Rt =I—t 2 Si-, thus, to first order in t, R t is orthogonal. 

Becaue Z is an eigenvector of Si with eigenvalue 0, R t Z = Z. This implies Z is the 
rotation axis of each R t . 

To determine the angular speed and show that it is constant, we need more de¬ 
tailed information about Si. The vectors 


Xi 


‘ f-f) 

V ® 2 + P 2 [ 0 J 


X 2 


Zx Ji _ 1 

l|Z|| ~ ||Z||Va 2 + /3 2 



will provide this information. They are orthogonal unit vectors that span a plane 
orthogonal to Z; we claim the rotation leaves that plane invariant. This follows from 
quick calculations that show 


SiX i = kX 2 and SiX 2 = —kX\, 


Z 



where k= 4- ||Z||. In terms of the basis {Xj. X 2 . Z} , the matrix for Si is 


0 -k 0 
k 0 0 
0 0 0 
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11 Stokes’ Theorem 


Linking Stokes’ and 
Green’s theorems 


Ingredients of the proof 


and the matrix for the flow 1 +1 !A is 

/1 -kt 0 \ 

( kt 1 0 . 

V° 0 l) 

It follows from Lemma 11.5 that, to first order in t, 1 +1RL is a uniform rotation with 
angular velocity k = \Z. □ 

Theorem 11.10. Suppose a steady fluid flow is governed by the velocity field F, 
and W* is a frame that is translated in space so as to remain centered on the fluid 
particle initially at the point a. Then a vanishingly small ball of fluid centered at the 
origin ofW* does the following. 

• Rotates with instantaneous angular velocity j curlF(a) 

• Changes its relative volume at the instantaneous rate divF(a) □ 


11.3 Stokes’ theorem 

Stokes’ theorem is our final setting for the assertion that the boundary operator 
and the exterior derivative are adjoints in the symbolic integral pairing: ( dS , ( 0 ) = 
(S,dco). In this setting, S is a piecewise-smooth oriented surface in (x,y,z)-space, 
and CO = Co(x,y,z) is a differential 1-form. In physical terms, Stokes’ theorem 
equates the circulation of a flow around the boundary of a surface to the flux of 
the vorticity field (Definition 11.4, p. 473) of that flow through the surface. 

To begin the process of proving Stokes’ theorem, we first note how similar it is 
to Green’s theorem. Both assert that 

® co = dco, 

JdS JJs 

where ft) is a 1-form and S is an oriented 2-dimensional region. The only difference 
is the dimension of the ambient space: Green’s theorem is set in R 2 (forcing S to be 
planar), whereas Stokes’ theorem is set in R 3 (allowing S to be curved.) But because 
a surface patch in space is parametrized by a plane region, we are able to use Green’s 
theorem to prove Stokes’. 

The proof follows quickly once the ingredients are assembled. Recall that a 
piecewise-smooth oriented surface (Definitions 10.7, p. 412, and 10.12, p. 420) is 
a finite sum of oriented surface patches whose common boundary segments have 
opposite orientations. An oriented surface patch S (Definition 10.2, p. 392) is the 
image S = f(U) of a closed bounded, positively oriented set U c Q with area, where 
the parametrization 

f: £2 —> R 3 

is a continuously differentiable 1-1 immersion on the open set OCR 2 . The map f 
is an immersion at the point a if the derivative df a : M 2 —> R 3 is 1—1. 
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The surface integral of a 2-form a defined everywhere on the surface patch S is 
given by the pullback f: 

/!“=///“ 

(Definition 10.4, as reformulated on p. 432). The value of the surface integral is 
independent of the parametrization used to represent S (Corollary 10.4, p. 401). The 
surface integral of a 2 -form over a piecewise-smooth oriented surface is the sum of 
the integrals over its smooth oriented pieces (Definition 10.13, p. 420). We also use 
the facts that the pullback f commutes with the exterior derivative (Theorem 10.16, 
p. 437), and that f and f are adjoints on piecewise-smooth curves C (Exercise 4.37, 
p. 149). Because these facts were first established in slightly different circumstances, 
we reconstruct them here, taking the target of f to be M 3 instead of K 2 . 

Lemma 11.7 .For any continuously differentiable map f : Q. —> M 3 and k-form 
a(x,y,z), f*(da) = d(f f a). 

Proof. All aspects of the proof of Theorem 10.16 for k f 2 carry over, mutatis mu¬ 
tandis. The only difference occurs for k= 2, where the 3-form da may be nonzero. 
However, the pullbacks f (da) and d (f a) are 3-forms in two variables, so they are 
both zero and f (da) = <7(1* a) for all k. □ 

The proof of the second lemma exploits, in addition, the fact that f is a 1-1 
immersion. 

Lemma 11.8 .For any piecewise-smooth oriented curve C and l-form (i defined 
everywhere on f(C), 


(f(C),/3}= f/3= frp = (C,t(P)). 

Jf(C) Jc 

Proof. Let C = C\ H-b C„, be a decomposition into smooth oriented curves, each 

of which is either simple or is a simple closed curve. Because f is a 1-1 immer¬ 
sion, each f (Q) is likewise either simple, or a simple closed, smooth oriented curve, 
providing a decomposition 

f(C)=f(Ci) + ---+f(C m ). 

Let u ft) = (iij(t),Vi(t)), at<t < bi parametrize Q; then 


X,(f) =f(ll ; (7)), 

(xi(t),yi(t),zi(t)) = (x( Ui (t),Vi(t)),y(ui(t),Vi(t)),z(ui(t),Vi(t))), 

parametrizes f(C,) with the same domain a, < t < b\. Suppose, for simplicity, that 
/3 = Pdx: then 


f/3 =,P(f(u))f(flb:) = P(f(u))(x u du+x v dv) 


L and d commute 


f and L are adjoints 


and 
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Stokes’ theorem for 
a surface patch 


Stokes’ theorem 


L f*j3 = [ P(f(ll/(f))) (. XuU'i+XyVj ) dt. 

JCi Jai 

On the other hand, 

/ P = [ P(f(vt i (t)))x i dt = [ P(f(n i (t)))(x u u i +x v v i ) dt = [ffi. 
Jf(Ci) Ja t Ja t JCi 

You can treat the 1-forms P = Qdy and p = Rdz the same way. □ 

Theorem 11.11. Let f: il - R ’ be a continuously differentiable 1—1 immersion on 
an open set QC1 2 . Let U C LI be a closed, bounded, positively oriented set with 
area on which Green’s theorem holds. If ft) is a continuously differentiable l-form 
defined on the oriented surface patch S = f (U), then 


® (0= d(0. 

JdS JJs 


Proof. We have the following sequence of equalities: 


[idea = [[ dto= [[ fdco 
JJs JJf{U) JJu 


JJo dira) 


= f* ft) 

JdU 


(0 


f(dU) 


= ® CO. 

Ids 


definition of surface integral 
d and f commute (Lemma 11.7) 
Green’s theorem for U 
f and f* are adjoints (Lemma 11.8) 

□ 


Corollary 11.12 (Stokes’ theorem) Suppose S = S H-h S m is apiecewise-smooth 

oriented surface, and suppose the theorem holds on each of the surface patches Si, 
i = 1,..., m; then 

® (0= d(0. 

JdS JJs 

Proof. Because the common segments of the various r)S, have opposite orientation, 
path integrals over those segments cancel in pairs; only the segments of dSj that lie 
in dS make a nonzero contribution to the path integral. Therefore, 


dS U 


CO = 


dSi 




d(0. 


The final equality is just the definition of a surface integral over S. □ 

From differential forms If CO = A dx + B dy + Cdz, then 
to vector fields 

dco = (Cy — B z )dydz+ (A z — C x ) dz dx + (B x — A y ) dxdy , 
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and Stokes’ theorem states that 

Adx + Bdy + Cdz = JJ-^ V — Bf)dydz + ( A z — C x )dzdx + (. B x — A y )dxdy. 

Our discussion of the connection between differential forms and scalar and vector 
fields (pp. 460-462) makes it easy to convert Stokes’ theorem into a statement about 
the integrals of vector fields. If CO = A dx + B dy + Cdz, then 

co = C 0 p <-> F = (A,B,C) 


and 

dco = d (ft)|p') = frtcuriF <—► curlF = {Cy — B : ,A Z — C X ,B X — A v ). 

If t is the positively oriented unit tangent vector on dS, then (cf. p. 19) 

<f> Adx + Bdy + Cdz= <b F • t ds = circulation of F around dS. 

JdS J dS 

If n is the unit normal that determines the orientation of S, then (cf. pp. 403-404) 


JJfiCy — B z )dydz + ( A z — C x )dzdx+ ( B x — A v )dxdy 

= jj curlF ■ n dA = total flux of curlF through S. 
With these connections, we can restate Stokes’ theorem for vector fields. 


Theorem 11.13 (Physical form of Stokes’ theorem). IfW is a continuously differ¬ 
entiable flow field defined on a piecewise-smooth oriented surface S, and curlF is 
its vorticity field, then 


</ F ■ t ds = 
JdS 



curlF • n dA, 


circulation of F around dS = total flux of curlF through S. 


□ 


To illustrate the theorem in its physical form, let us add an extended example to 
the two we considered in the last section. We take the flow field and its vorticity 
field (Definition 11.4, p. 473) to be 

F = (yz, —xz, 0), curlF = (x,y, —2z). 

This flow is similar to the flow of Example 1, where F = {—coy, cox, 0) (pp. 462- 
463). In that case, the entire fluid rotated rigidly (i.e., without the particles changing 
their relative positions over time) with constant angular speed co around the z-axis. 
The flow in Example 3 is only slightly more complicated: the variable —z simply 
replaces the constant co. Thus the fluid at each level z = c rotates rigidly around the 
z-axis with its own constant angular speed CO = —c. 


Example 3: 

F = (yz, —xz, 0) 
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11 Stokes’ Theorem 


Shearing actions 
of the fluid 


Vortex lines of F 


The vortex lines have 
a saddle at the origin 



Imagine the fluid is separated into parallel disks. The disks above the (jc,_y)-plane 
rotate clockwise (when viewed from above); those below, counterclockwise. The 
farther a disk is from the (x,y) -plane, the faster it spins. This difference introduces a 
new shearing action within the fluid that we did not see in Example 1. There, the only 
shearing action was caused by the greater speed of particles farther from the z-axis. 
That is present here as well, and accounts for the component —2z in the vorticity 
vector curlF = (x,y, —2z). The new shearing action, between disks, accounts for the 
other components. 

Let us examine all this in more detail. To connect the circulation of a field F to 
the flux of its vorticity field curlF, we naturally think of the second field, curlF, as 
a flow of “particles.” The paths of these particles are called the vortex lines of F. 
In the present case, the vortex lines are paths x(t) = (x(t),y(t),z(t)) that satisfy the 
differential equations (cf. p. 462) 


x'(t) = curlF(x(f)), or x = x. y’ =y, z' = —2z. 


The general solution here is the three-parameter family 

x(t) = (ae 1 ,be‘ ,ce~ 2t ). 

This describes the motion of the particle that is initially at the point x(0) = (a, b, c), 
which can thus be anywhere in space. In particular, if the initial point lies in the 
vertical plane y = mx (so that b = ma), then the entire path is in the same plane, 
because 

y(t) = be* = mae f = mx(t). 

In fact, this equation shows that we obtain all paths by rotating the paths that lie in 
a single vertical plane (e.g., in the plane y = 0) around the z-axis. The flow of curlF 
has rotational symmetry around the z-axis. 

The solutions on the plane y = 0 (i.e., where b = 0) are 

x(t)=ae f , y(t) = 0, z(t) = ce~ 2t . 
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For a given a ^ 0 and c, this vortex line lies on the graph of the function 

z = k/x 2 , k = ca 2 , 


in the (x,z)-plane. Particles on these trajectories flow simultaneously away from the 
z-axis and toward the (x.y)-planc. Particles on the z-axis (where a = 0) flow directly 
toward the origin. Particles in the (x,y)-plane (c = 0) flow radially away from the 
origin on straight lines. The origin is said to be a saddle point of the flow. 




When the graph z = k/x 2 is rotated around the z-axis, it sweeps out the horn- 
shaped surface that is the graph of z = k/(x 2 +y 2 ). (In the figure on the right, above, 
the portion of each surface that lies in the first quadrant has been cut away for better 
visibility.) The horn-shaped surfaces make it is easy to visualize the two flows and 
the way they are related: the flow lines of curlF (the vortex lines) are intersections 
of those surfaces with the vertical planes y = mx that are “hinged” on the z-axis; the 
flow lines of F itself are their intersections with the horizontal planes z — c. 

To examine the link between the two fields, we need an oriented surface S. We 
take S to be a cylinder centered at the origin; let its axis be the z-axis, and let its 
orientation normal n be outward-pointing. Let the radius be R and the height 2 H. 
The boundary of S is a pair of circles. The orientation induced by S on the upper 
one, dSi, is clockwise when viewed from above; on the lower one, dSj, it is coun¬ 
terclockwise. We want to compare the circulation of F around dS = dS\ + dS 2 with 
the total flux of curlF through S. 


How the flows intersect 

z = k/{x 2 +y 2 ) 


The surface S 
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11 Stokes’ Theorem 


Circulation of F 
around dS 


Total flux of curlF 
through S 


The flow F is everywhere tangent to dS, so the circulation is quite simple to 
calculate. The upper circle, dS i, is in the horizontal plane z = H, wherein the fluid 
governed by F rotates with constant angular speed co = H in the clockwise direction, 
the direction of dS\. Because dS\ is a circle of radius R, the fluid on it moves with 
speed HR. The circulation of F on dS \ is therefore just the product of this speed and 
the length of dSi , namely the positive quantity 2nR 2 H. Taking orientations properly 
into account, we see that the circulation around the bottom of the cylinder, dSi, is 
the same; thus, 

circulation of F around dS = 4nRrH. 



Now consider the vortex field curlF on S. Although curlF is neither constant in 
magnitude nor perpendicular to S, we now show that its projection onto the orienting 
normal n is constant, so total flux O is also simple to calculate. First, write the 
coordinates of a point on S in the form 

(x,y,z) = (Rcos9,Rsm9,z)-, 

( R , 9,z) are the cylindrical coordinates of the point. At this point, the vectors n and 
curlF have the form 

n = (cos0,sin0,O) and curlF = (R cos 9.R sin 9, — 2z), 
from which it follows that 

curlF n = R 

everywhere on S. In principle (Definition 10.1, p. 389), ® is the product of this 
projection length and the area of S. When the projection length varies, the product 
needs to be rendered as a surface integral, but that is unnecessary here. Thus we 
have 

® = curlF • n area,? = R x [2nR x 2 H) = 4nR 2 H; 

hence 

circulation of F around dS = total flux of curlF through S, 


as we wished to show. 
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We now confirm this relation between F and curlF on a second surface. Let 
Z\ be the flat disk whose boundary is dS\. To make dZ\ = dS\, that is, to make 
the orientations match, the orienting normal for Z\ must point downward: ni = 
(0,0, —1). Let Zi be the disk whose boundary is dSS; here the orienting normal is 
H 2 = (0,0, + l). Finally, let Z = Z\ +Z. 2 - Then dZ = dS, so 

circulation of F around dZ = 4 kR 2 H , 

as before. The total flux of curlF through Z is once again a simple calculation. On 
Z i, a point has coordinates ( x,y,H ), so 

curlF - 11 ! = (x.y.-2H) ■ (0,0, -1) = +2 H 

there. Because this is a constant, the flux through Z\ is just 

<f>! = curlF ■ ni areaZi = 2 H x nR 2 = 2nR 1 H. 

OnZ 2 , curlF-n 2 = (x,y,2H)- (0,0,1) =2H once again, and the flux is ®2 =2 nR 2 H. 
Hence, 


circulation of F around dZ = total flux of curlF through Z. 


The surface Z 




With S and Z, we were able to determine total flux without calculating an integral. The surface T 

Here is a third surface, T, for which the integral is necessary. We define 7 by a 
parametrization f: U —> T (with a to be determined): 


{ x = a cos u cosh v, 
y = a sin u cosh v, 
z = v, 


f T . — 7t <u<n, 
U ' -H<v<H. 


This is a surface of revolution around the z-axis; if a = 1, it is called a catenoid. 
However, we want to choose a so that dT = dS\ + cLS?. Thus, in the first quadrant 
in the (x.z)-planc (where cosm = 1), we want x = R when z = H. Consequently 
R = a cosh H, so a = R/ cosh//. Note: f is not 1-1 on U, so T is not a surface patch, 
strictly speaking. We should break up T into two separate pieces. We can accomplish 
that by breaking up U into two pieces (e.g., with — n < u < 0 and 0 < u < n) and 
using the same formula f for each. But nothing essential is lost by treating these 
two pieces together, as we do; see a similar comment (p. 396) about parametrizing 
a sphere. 

To determine the total flux of curlF through T, we must pull back 



d(D = xdydz+ydzdx ~ 2zdxdy 


to U using f. We have 
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Oriented surfaces with 
the same boundary 


Implications of the 
divergence theorem 



R: 

' S 



ffdydz) = acosucoshvdudv, f*(dzdx) = asinucoshvdudv , 

f(dxdy) = — a 2 sinhvcoshvt/Mt/v, 

from which it follows that 

f(da>) = a 2 (cosh 2 v + 2vsinhvcoshv)fih<rfv. 


Therefore, the total flux is 

[[ dco= [[ f(dco) = or [ du [ (cosh 2 v + 2vsinhvcosh v)dv 

JJf JJu J—n J-H 


= a 2 x 2n x 2H cosh 2 //. 


(Note that cosh 2 v + 2vsinhvcoshv = (vcosh 2 v)\) Because a 2 
find 


total flux of curlF through T = 4 kR 2 H, 
agreeing with the values for S and E. 


R 2 / cosh 2 H, we 


Of course it is not an accident that the total flux of curlF through one of these sur¬ 
faces has the same value as through any other. It is a simple consequence of Stokes’ 
theorem and the fact that the three surfaces have the same boundary, including ori¬ 
entation. 


Theorem 11.14. Suppose Stokes’ theorem holds for the piecewise-smooth oriented 
surfaces S and E, and dS = dE. If (0 is any l-form defined on a region containing 
S and E, then 

IL da =l!i da - 

Proof. The proof involves two applications of Stokes’ theorem: 


ffdco = co= m= ff dm. 

JJs JdS JdZ JJz 


□ 


The divergence theorem gives us another way to show that the total flux of curlF 
through any two of the surfaces in our Example 3 must be equal. The key is that any 
two of the surfaces, properly reoriented, form the total boundary of a 3-dimensional 
region. For example, S and t make up the boundary of the positively oriented 
cylindrical region 

B. x 2 +y 2 <R 2 , 

-H<z<H. 

Now apply the divergence theorem to the region R and the 2-form dot on dR\ be¬ 
cause d(dco) = 0 on R (because d 2 = 0 always), we find 


0= f[fd{d(o)= [[ dco= ff dco= ffdco - ff dm. 

JJJr JJdR JJs-z JJs JJz 
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Of course, this argument using the divergence theorem works equally well for any 
pair of piecewise-smooth oriented surfaces that have a common boundary and that 
together form the complete boundary (when properly reoriented) of a 3-dimensional 
region. 

In Theorem 11.8 and also in the discussion on pages 471-473, we established 
that the fundamental link between circulation and curl has the form 

circulation of F around dS , 

lim ---= curlll (a) ■ n, 

8(s)->o area of S 


where S is centered at a and lies in the plane with normal n passing through a; 5 (S) 
is the diameter of S (Definition 8.14, p. 291). However, to carry out the computa¬ 
tions, we needed S to be either an ellipse or a rectangle. Stokes’ theorem is a more 
powerful computational tool; with it, we can now remove this restriction. 

Theorem 11.15. Suppose the flow field F is defined on an open set Q C R 3 . Let 
S k CQ.be a sequence of closed surfaces with area that pass through a common 
point a and lie in a common plane with unit normal n. Suppose that the diameter 
8 ( S/c ) —> 0 as k —> °o and the boundary’ of each S k is a piecewise-smooth simple 
closed curve. Then 


lim 

k-> ~ 


circulation of F around dS k 
area of S k 


curlF(a) ■ n, 


where circulation is computed in the direction t along dS k for which the ordered 
triple of vectors {outward normal to S k in the plane, t,n} has the same orientation 
as the coordinate axes. 




Proof. By Stokes’ theorem, the circulation of F around dS k is 


J> F • t ds = 
J dS^ 



curlF • n dA. 


By an adaptation of the law of the mean for double integrals (Theorem 3.7, p. 76), 


// curlF- n dA = curlF(a*) • n areaS^, 
JJSt 


)s k 

where a* is a point in Sp, note that n is constant in the integral. Now let k —> then 
8 (S k ) —> 0, so a* —> a. □ 

The theorem calls our attention to the quantity 

. . circulation of F around dS k 

q a,n) = lim-—- 

*-»<» area ol S k 


Circulation 
per unit area 


that depends on the point a and the unit normal n, but not on the particular sets S k 
used to define it; let us call it the circulation of F per unit area at the point a in the 
direction n. Now let 
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Using differential forms 


Solving a differential 
equation by integrating 


?;(») = ?( a,e ; ), i= 1,2,3, 
where {ei,e 2 , e 3 } is the standard basis in R 3 . 

Corollary 11.16 curlF(a) = (?i(a), 172 (a),( 73 (a)). □ 

This corollary serves, as Lemmas 11.2-11.4 did, to give us insight into the way the 
vorticity vector curlF is linked to F itself. It provides us with an alternative to the 
physical imagery of a little ball set spinning by the shearing action of a fluid flow 
represented by F. 


11.4 Closed and exact forms 

In this section, we use differential forms to define and to solve ordinary and partial 
differential equations. When the differential forms have certain special properties 
(e.g., when they are closed or exact), the solutions can be particularly simple and el¬ 
egant. In a different vein, we also show how those special forms give us information 
about the geometry of the domains on which they are defined. 

Analytic methods for solving differential equations frequently involve integra¬ 
tion, or quadrature, as it is traditionally called in this context. For example, the 
solution of the basic equation dy/dx = f(x ) is a quadrature: 



The integral represents the infinite collection of functions F(x) + c, where F is a 
specific antiderivative of / (i.e., F'[x) = fix)), and c is an arbitrary constant. We 
say the solutions form a one-parameter family; c is the parameter. 

For the more complicated equation dy/dx = f{x)/g(y), a solution is a function 
y=(p(x ) for which 

( p' (x) = ——— for all x in some nonempty interval. 

g{V M) 

To find <p, first rewrite the differential equation as an “equation with differentials” 
in such a way that the two variables are separated: 

g(y) dy - f(x) dx = 0. 

This implies G(y ) —F(x) = c, where c is an arbitrary constant, and G and F are spe¬ 
cific antiderivatives of g and /, respectively. For each c that we specify, the equation 
G(y) — Fix) = c is a relation between x and v that defines y implicitly as a function 
ofx (cf. Chapter 6.1) if there is a “seed point” (x,y) = (a,b) for which 


G(b) — F(a)=c and G'{b)f 0. 
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The implicit function y = (p c (x) that is supplied by Theorem 6.1, page 189 (and by 
the specified c), has (p c (a) = b and satisfies the conditions 


G(<p c (x)) — F(x) = c and (p' c (x) =— 


-F'(x) 


/(*) 


G'((Pc{x )) g(<Pc(x )) 


at all points x on some open interval including x = a. Thus (p c (x) is indeed a solution 
to the differential equation. As in the simpler case, c serves to parametrize an infinite 
family of such solutions. 

The function G(y) — F(x) is called a primitive, or first integral, of the differen¬ 
tial equation. The first name is suggested by the fact that solutions emerge from it 
(as implicitly defined functions). The second name is suggested by the fact that we 
can write it as an integral: 

G{y) - F(x) = j g(y ) dy - J f{x) dx. 

Simply put, we solve the differential equation by integrating it to obtain a first inte¬ 
gral/primitive that defines solutions implicitly. 

There is a larger class of differential equations, called exact, that can be integrated 
the same way. An exact differential equation has the form 


0 x (x,y) dx + 0 } ,(x,y) dy = 0; 

it has this name because the left-hand side is exactly equal to the differential (i.e., 
the exterior derivative) of the function <5 (x,y): d& = 0 X dx + 0 V dy. A solution is a 
function y = (p(x) for which 

(x,(p{x)) + 0y (x, (p (x )) (p' (x) = 0 


for all x in some nonempty interval. (In other words, the given differential equation 
is satisfied when we substitute <p(x) fory and <p'(x) dx for dy.) Because we can write 
the differential equation as d0 = 0, we have 

®{x,y) = J d&=c, 

implying that & is a first integral for the differential equation. In other words, if we 
fix c and find a “seed point” (x,y) = (a, b) for which 


0{a,b)=c and & v (a,b) 0, 


then the implicit function theorem provides a function y = (p c (x) with cp c (a) = b and 
for which 


0(x, (p c (x)) = c and <p' c (x) 


-0 x (x,(p c (x)) 
0y(x,(p c (x )) 


for all x on some open interval containing x = a. Thus (p c is a solution to the differ¬ 
ential equation. 


Primitives and 
first integrals 


Exact differential 
equations 
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Examples 


Exact forms 


Closed forms 


An integrability 
condition 


The differential equation ydx+xdy = 0 is exact. One first integral is 0 = xy, and 
the solutions are the functions <p c (x) = c/x. In other cases, it may be more difficult 
to see whether the differential equation is exact. For example, 


xT^ dx + ^y d y = 0 ’ (*, t )^( 0 , 0 ), 

is exact; its first integral is 


9(x,y ) = arctan 



as can be verified immediately (cf. pp. 429-431). The solution implicitly defined by 
the equation 0 = c is the linear function cp c (x) = (tanc)x, 0. These solutions 
form a one-parameter family whose graphs are the straight lines that radiate from 
the origin. We cannot expect the solutions to be defined at the origin because the 
differential equation itself is not defined there. The example has an even more re¬ 
markable feature: the first integral 9(x,y) must be multiple-valued if it is to avoid 
discontinuities on the punctured plane. See the graph of z = 9 (x,y) on page 430. 

Each differential equation we have been considering can be written in the form 
CO = 0 when CO is the general \-foxvo P{x,y) dx A- Q{x,y) dy. In terms of ft), an exact 
differential equation is one for which ft) = d0 for some 0-form 0. In this case, we 
now say ft) itself is exact, and then extend this definition to general k-forms. 


Definition 11.5 A differential k-form ft) in n variables is said to be exact if there is 
a ( k — \)-form a for which co = dcx. 

When co = da is exact, then dco = dra = 0 (because d 2 = 0 by Theorem 10.17, 
p. 439). Recall that the exterior derivative “ d ” and the boundary operator “d” are 
paired as adjoints (cf. p. 428). We use the term closed for a curve or surface S that 
has zero boundary, dS = 0; therefore the pairing suggests the same term, closed , for 
a differential form ft) that has zero exterior derivative, dco = 0. 


Definition 11.6 We say the k-form co is closed if dco = 0. 

These definitions lead to the following conclusion. 

Corollary 11.17 Every exact form is closed. □ 

The corollary gives us a necessary condition for a differential equation of the 
more general form 

ft) = P(x,y) dx + Q(x.y) dy = 0 

to be exact: we must have Q x = P y (because dco = (Q x P y ) dxdy). This also follows 
from the equality of mixed partial derivatives, for if co is an exact 1-form, with 
CO = d0 = 0 t dx 4 0 V dy, then P = 0 X , Q = 0 V , and 


Qx = {0y)x = {0x)y=Py 
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Because an exact differential equation is “integrable” (i.e., solvable by integrations), 
we think of Q x = P y as an integrability condition, in this case, a necessaiy condi¬ 
tion. As we show below, Q x = P y is also a sufficient condition for the integrability 
of co = 0, at least locally. 

The integrability condition shows that, for example, xdy—ydx = 0 cannot be 
exact (Q x — P y = 2). Nevertheless, when we rewrite this differential equation as 


xdy — ydx 
x A +y z 

it becomes exact. It has the first integral arctan (y/x), as noted above. Because the 
added factor 1 /(x 2 +y 2 ) makes the differential equation integrable, we call it an 
integrating factor. Another integrating factor is 1 /x 2 , because 


xdy—ydx 


—¥r dx- 1— dy d I ) . 

X 1 X \x) 


In this case the first integral is y/x, not arctan(y/x), but the solution graphs are 
unchanged; they are the same straight lines that radiate from the origin. With yet 
another integrating factor, 1 /xy, the differential equation even becomes separable: 

xy y x 

Here the first integral is ln[y/x|; it leads once again to the same solution graphs. Of 
course most differential equations fail to be exact and fail to have integrating factors 
that make them exact. We leave further discussion of the art of finding integrating 
factors to texts on differential equations. 

Every exact form is closed; is every closed form exact? The closed form 

—ydx + xdy 


reveals a difficulty. The form is defined everywhere in the punctured plane P = 
R 2 \ (0,0), but there is no continuously differentiable, single-valued function 0{x,y) 
on 'P for which d& = a> (see Exercise 11.12). As pointed out above (and on pp. 429- 
431), the angle function 9(x,y) = arctan(y/x) does have d9 = co everywhere on P, 
but it is multiple-valued. There is no way to assign a unique angle to every point in 
P without having discontinuities. (There is a similar difficulty with the earth’s time 
zones: time increases steadily in the eastward direction, until the International Date 
Line, where it drops back 24 hours.) The obstruction to continuitity disappears if we 
restrict co to a domain that has no closed path encircling the origin. For example, we 
can use a disk or a rectangle that excludes the origin. More generally, we have the 
following result for closed 1-forms in two variables. 

Theorem 11.18. Suppose the l-form co = P(x,y)dx + Q(x,y)dy is defined and 
closed on a rectangular window W centered at a point (a, b ) in R 2 . Then co(x,y) = 
d&(x,y) for every (x,y) in W, where 


Integrating factors 


Is a closed form exact? 
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W 


fry) 



(a, b ) fr b ) 


Local exactness 


From partial 
to ordinary 
differential equations 


0(x,y) = [ P(t,b)dt+ I Q(x,t)dt. 

J a Jb 

Proof. It suffices to show that 0 X = P, & v = Q in W. Because y does not appear in 
the first integral, and appears only in the upper limit of integration in the second, the 
equality <P V = Q is immediate. To verify the other equality, we use the integrability 
condition Q x = P v to write 


0 


■fr,y) =P(x,b) + I Q x (x,t)dt=P(x,b)+ I P y fr,t)dt 

Jb Jb 


= P(x,b) + P(x,t) 


= P(x,b)+P(x,y)-P(x,b)= P{x,y). 


□ 


Our goal is to prove this result in full generality: to show that a k-form in n vari¬ 
ables that is closed inside a rectangular parallelepiped (a “window”) is exact there. 
We say that such a closed form is locally exact. 

Consider how Theorem 11.18 accomplished the goal. The function 0(x,y) for 
which 

d0 = 0 x dx + 0 y dy = Pdx + Qdy = CO, 
is a solution to the pair of partial differential equations 

0 X =P, 0y = Q, 

where P(x,y) and Q(x,y) are given functions that satisfy the integrability condition 
Q x = P y . But the theorem presents 0(x,y) as fi*(x) + G x (y), where iy, and G x are 
the particular solutions of the ordinary differential equations 

Fl(t)=P(t,b), G' x (t) = Q(x,t), 

that satisfy the initial conditions 

F b (a) = 0, G x (b) = 0. 

In the first function, b is a parameter; in the second, x is. 

In general, showing that a closed £-fonn in n variables is locally exact reduces to 
solving a set of partial differential equations in the presence of certain integrability 
conditions. For example, take k = 2, n = 3, and suppose 

(0 = P(x,y,z) dvdz + Q(x,y,z) dzdx + R(x,y,z) dxdy 

is closed; this implies P x + Q y + R s = 0. If co is to be exact, we need a 1-form 

oc =A(x,y,z) dx + B(x,y,z ) dv + C(x,v,z) dz 

with da = co; this implies 


C y -B : = P, A Z -C X = Q, B x — A y =R. 
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These are three partial differential equations for the three unknown functions A, B, 
and C, together with one integrability condition P x + Q y +R z = 0 imposed on the 
known functions P, Q, and R. More generally, local exactness of a closed k- form in 
n variables involves (”) partial differential equations for (^"j) unknown functions 
together with ( A ,"[) integrability conditions (see Exercise 11.25). To prove local 
exactness for a general k- form by the approach of Theorem 11.18, we must first 
reduce the partial differential equations with their integrability conditions into ordi¬ 
nary differential equations whose solutions (expressed as ordinary integrals) supply 
the coefficients of the needed (k— l)-form. 

In “The Poincare Lemma and an Elementary Construction of Vector Poten¬ 
tials” [22], Shirley Llamado Yap introduces an algorithm for carrying out this ap¬ 
proach. The algorithm constructs a solution for every k and n> k, using induction 
on n. Because the argument involves a flurry of subscripts, we first step through 
the (subscript-free) example with k = 2 and n = 3 we have just introduced. Thus 
we are given three functions P(x,y,z), Q(x,y,z), and R(x,y,z) that are defined in 
a window centered at ( x,y,z ) = ( a,b,c ). They satisfy the integrability condition 
P x + Q y + R z = 0. We seek three functions A(x,y,z), B(x,y,z), and C(x,y,z ) that 
satisfy the three partial differential equations 

C y -B z = P i A z — C x = Q, B x A v = R. 


in that window. 

A system of partial differential equations typically has many solutions. We seek a 
solution in which C(x,y,z) = 0. In that case the first two equations reduce to differen¬ 
tiation with respect to z alone: A z = Q, B z = —P. By treating x and y as parameters, 
we can think of these as ordinary differential equations in z, whose solutions are 
then given by integration: 


A(x,y,z) =A(x,y,c 


B(x,y,z) =B(x,y,c 


) + J Q(x,y,t)dt, 
)-j P{x,y,t)dt. 


The first equation expresses the values of A off the plane z = c in terms of its values 
on that plane (and on the values of Q in the window). But we have not yet determined 
the values of A on the plane. 

The situation is similar for B, but, following the approach we took with C, we 
seek a solution in which B(x,y, c) = 0. In that case, the equation for B reduces to 


B(x,y,z) = — P(x,y,t)dt. 

Now consider the third partial differential equation, B x —A y = R. For the moment, 
we look for a solution only on the plane z = c; in Step 3, we remove this restriction. 
(The move from the plane to 3-space becomes the induction step in the general 
algorithm.) On z = c, we have B = 0 by Step 1, so B x —A y = R reduces to 


Solving the case 

k = 2, n = 3 


Step 1 


Step 2 
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Ay(x,y,c) = -R(x,y,c). 

We can treat this as an ordinary differential equation in y (with x and c as parame¬ 
ters); its solution is the integral 


Step 3 


A(x,y,c) = A(x,b,c) — / R(x,t,c)dt. 

Jb 

By analogy with what we have done with C and B, we seek a solution for which 
A(x,b,c ) = 0; then 

f y 

A(x,y,c) = — R(x,t,c)dt. 

Jb 

Combining the results from Steps 1 and 2, we obtain the following formulas for 
A and B that are defined in the entire window: 

fy f z 

A(x,y,z) = — J R(x,t,c)dt + J Q(x,y,t)dt , 

B(x,y,z ) = — P(x,y,t)dt. 

J c 

But we have not yet verified that A and B, as defined by these formulas, satisfy 
the third partial differential equation everywhere in the window. In Step 2, we con¬ 
structed A and B to satisfy that third equation only on the plane z = c. 

To show that A and B also satisfy the third equation when zfc, we take the 
following approach. We can write the third equation as E = 0, where 


E(x,y,z) = B x (x,y,z) -A y (x,y,z)-R(x,y,z). 


By Step 2, E equals zero whenz = c (and ( x,y,z ) lies in the window); we must show 
that E remains equal to zero for all z in some open neighborhood of z = c. We claim 
that the derivative of E with respect to z is zero; it will then follow that the value of 
E does not change—and will thus remain equal to zero—as z moves away from c. 
To prove the claim, we invoke the integrability condition P x + Q y + R z = 0: 

dE 

— = B xz - A yz - R z = (B z ) x - (A z ) y - R z = -P x - Q y -R z = 0. 


This completes the construction of the 1-form a for which da = co, and thus 
proves the Poincare lemma in this case. 

Theorem 11.19. If (0 = Pdydz+ Qdzdx + Rdxdy is closed in a window centered 
at (x,y,z) = ( a,b,c ), then ft) = da, where a = Adx + Bdy + Cdz and 


A(x,y,z) = — I R(x,t,c)dt + 

Jb 

B(x,y,z) = ~ J P(x,y,t)dt , 
C(x,y,z) = 0. 



□ 


11.4 Closed and exact forms 


499 


The 1-form a that makes to locally exact is not unique. If /3 also makes ft) locally 
exact (i.e., 4/3 = co), then 


a = d(a + d< J>) 

for any <P 


d(f} - a) = co- co = 0. 


Hence /3 — a is a closed 1-form, so by the Poincare lemma for 1-forms (Theo¬ 
rem 11.18, as extended to higher dimensions in Exercises 11.23 and 11.24), /3 — a 
is itself locally exact: /3 — a = dd> for some 0-form <Z>. The most general 1-form 
that makes ft) locally exact is thus a + d&, where & is an arbitrary 0-form. 

We move on now to Yap’s general algorithm for constructing a (k— l)-foim a The general algorithm 
that makes a closed /c-form co locally exact: co = da. The construction proceeds 
inductively on the dimension n of the space on which the forms are defined. For 
simplicity, we assume that ft) is defined on a window (rectangular parallelepiped) 
centered at the origin in R”. 

Throughout the argument, k is fixed. The induction on n begins with n = k. In The base, with n = k 
this case, a k -form co has only a single term, 


ft) = P(x i,... ,Xk) dx i ■ • • dxk, 


and co is automatically closed. Let us take 


a =A(x i,... ,xic)dxi ■ ■ ■ dxk-i; 


then 

da = tt — dx^dx i • • • dxk -1 = (— l) i_1 —— dx\ ■ ■ -dx^i dx/-. 

OXh OXjc 

Thus ft) = da if A satisfies the partial differential equation 



We can solve this differential equation immediately by integration: 



This completes the construction of a when n = k. 


Now take n > k and use induction. That is, assume the algorithm works for 
k-forms in R" _1 and then show that it works for k-forms in R". The arguments 
make extensive use of multi-indices; see pages 439-443. 


The induction, 
with n > k 


We are given a closed /c-form 


CO = ^ j P I (xi,...,x n )dx l , 1= ()!,... ,4), 


with 1 < i\ < ■ ■ ■ < 4 < n. We want to find a (k — l)-form 
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Step 1 


cc — C*T j ■ ■ ■ : x n)dx^ , I s — ( 14 ,... , 4) ■ 
% 

for which co = da in an open neighborhood of x = 0. Because 

^ /* .dA T \ 

‘ ,a= ${£ ( -' r 'tot )“'■ 


(Theorem 10.19, p. 441), the condition co = da yields the following (£) partial 
differential equations 


Pi 


K-i r 1 


5=1 


dAf 

dx is 


for the ( k "i) unknown functions Aj . To obtain these functions, we follow the same 
steps as in the example above (pp. 497-499). 


First, restrict the multi-index 1 to the case where 4 = n. Consider the new multi¬ 
index J = (z'i,... ,4-t) with 4-i <n — 1. If we define J s by analogy with I s , then 
the restriction 4 = n means that / = J,n and 


4 =J s ,n if s < k, 4 : J. 


Now consider the partial differential equation for which I = J,n: 


Pj.n = 


dA 


J\ ,n 


dx h 


dA 


J2,n 


dxi. 


+ ••• + (- 1 ) 


k -1 


dAj 

dx n 


Following the example, we begin the process of determining the functions A j by 
setting 


Aj un (x h ...,Xn) = ■■■ =Af tin (x U -.-,x„) = 0 . 


Because the multi-index J s selects k— 2 distinct elements from the first n — 1 positive 
integers, these equations for Aj- n determine ("l]) of the functions we seek. 


Each partial differential equation for which I = J,n thus reduces to a single term 
on the right and involves only one unknown function, A j. We write this equation in 
the form 


dAj 

dx„ 


(-1 ) k ~ l Pj,n 


Integration yields 


Aj(x\,...,x n )=Aj(x\,...,x„-\,$) + (-\) 






This determines Aj in terms of its values on the hyperplane x„ = 0 (and the values of 
the known function Pj M in the window), but we have not yet determined the values 
of Aj on that hyperplane. 
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There are (”“ J) functions Aj defined by these integrals; together with the ( n , 2 ) 
functions already set equal to zero, we have identified all the (^"j) unknown func¬ 
tions, because (cf. Exercise 11.26) 


n — 1\ /« — 1\ / n 

k-\) + \k-2) = \k-\ 

We now determine the functions Aj on the hyperplane x n = 0. In Step 1 we 
exhausted the possibility that 4 = so from this point on we take 4 < «• This 
means that dx n no longer appears in any basic differential dx/. Because x n itself no 
longer appears as a variable on the hyperplane x„ = 0, we have reduced the setting 
to differential forms on IB." “ 1 . Therefore, if we can pull back co and a to forms co* 
and a* on M"" 1 in such a way that dco* = 0 and da* = co* on R" _1 , we can use the 
induction hypothesis to obtain the functions A j. 

Consider the map f: —» R" : y —> x into the hyperplane x n = 0: 


xi=yi, 


f: 


Xn— 1 — Tm— 1 j 

, Xn — 0 - 


By Theorem 10.20, page 443, and the discussion preceding it, the pullback of f on 
a basic £-form is 


fdxj = 


dyi 

0 


if 4 < n, 
if 4 = n. 


This suggests we define a new multi-index I* = I with the restriction that 4 < n — 1. 
The pullback of co is then 


(0*{yi,...,y„-\) = f (O = ^P I *(yi,...,y„-i,0)dy 1 *. 
i* 


Because d and f commute, co* is closed: 


dco* = dfco = f dco = f 0 = 0. 


For a we have 

a*(y l ,...,y n - 1 )=fa = '2 J Aj is (y u ...,y n _ h 0)dy } i s , 
Ps 

da*{y l ,...,y n - l )=dfa=rda = ^ j 

1 * 



dy /*, 


Step 2 


and the condition co = da implies 
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Step 3 


co* = f CO = f da = df a = da*. 


The equation ( 0 * = da* then implies the partial differential equations 

k dAfi 

Pr(yu...,y n -uO) = 'Z(-iy- 1 ^(y u ...,y n -uO). 

5=1 aXi s 

By the induction hypothesis, the algorithm supplies solutions 

to these differential equations. The multi-index I* s is an increasing sequence of k— 1 
positive integers between 1 and n — 1, so it equals a unique multi-index heretofore 
written as J. Conversely, suppose J is an arbitrary (k— l)-multi-index. Because 
k— 1 < n — 2, at least one integer j in the range from 1 to n — 1 is missing from J. 
Let I* be the ^-multi-index constructed by augmenting J by inserting j in the proper 
place. If j is in the fth place in /*, then I* £ = J. Thus, every J equals some I* s , and 
we have determined all the functions 

Aj(x i,...,x„_i,0) 

that remained to be found at the end of Step 1. 

Steps 1 and 2 together give us all the functions Aj(x\ ,...,x„) defined in a window 
centered at 0 in R", but as yet we know only that those functions satisfy the partial 
differential equations when x n = 0. We can put the matter this way (cf. Step 3 of the 
example). The partial differential equation indexed by I* can be written as Ep = 0, 
where 


dAp, 

El* (xj, . . . ,X W ) = ( 1) —z: (xj , . . . ,X W ) P/* (Xl, . . . ,X„). 

5=1 dXi s 

By Step 2, Ej* equals zero when x n = 0 (and when (x\.... ,x„) is in the window). 
We claim Ep remains equal to zero for all x n in an open interval centered at 0. To 
prove the claim, it is enough to show dEp/dx n = 0. 

In the example, we invoked the integrability conditions to prove the claim. For the 
same reason, we invoke them here. The integrability conditions are the coefficients 
of the ( k+ l)-form da> set equal to zero; therefore we begin by expressing co in a 
way that allows us to read off those coefficients of dco. To index (k+ 1)-forms, let 
L = (j'i ,..., 4+i), 1 </,■<••• < 4+1 < «, and write 

(0 = Yj P L s (* 1 ! • • • - x n)dxi s , 

L s 


where s = 1 1. Then 
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fk +1 


d(0 = Y, \ Z(“ 1 ) 


i dPj 

1 L. 


L \S =1 

so the integrability conditions are 

k +1 


dx u 


dx 


■L, 


i dPj 

K-ir 1 

s=\ 


dx u 


= 0 . 


There are ( t "j) such conditions, one for each multi-index L = (z) ,...,4+1 )■ We 
focus on those multi-indices L for which 4+1 = «• Then L=I* 7 n and 

L s =I* s ,n ifs <&+1, and =/*. 


The integrability condition indexed by this L is 




= 0 . 


Now let us determine dE[*/dx n . For each multi-index I*, we have 

dE r = y = yv ^s-i d ( dA P s \ _ dp i* 

dx„ “j dx n y dx is J dx„ dx is y dx n J dx„ 


We know that each I* s = J for a suitable multi-index J; thus we can write, using the 
partial differential equation for A j from Step 1, 


Flence 




OX n UXi s OX n 


so 


(- 1 ) 


k-\ 


dE It ± ■ dP, 


dx n 


= ^(-l),-t_iVi + ( -l ) ^ = 0 


S=1 


dx u 


d Pi* 
dx n 


by the integrability condition. This proves the claim, and thus establishes the algo¬ 
rithm. 


Theorem 11.20 (Poincare lemma). Suppose ft) is a closed k-form defined in a win¬ 
dow centered at the origin in R". Then there is a (k— \ )-form a for which ft) = da 
in that window. The coefficients of a can be obtained from the coefficients of ft) by 
integration (quadrature). □ 


















504 


11 Stokes’ Theorem 


The effect 
of the domain 


j8 is closed, but... 


The {k —\)-form a in the Poincare lemma is not unique: if y is any (k— 2)-form, 
then ft) = d(a + dy). In fact, it follows from the Poincare lemma that all {k— 1)- 
forms p with ft) = dp can be expressed this way. 

Corollary 11.21 If ft) = da = dp, then locally p = a + dy for some properly cho¬ 
sen (k— 2)-form y. 

Proof. Note that a — P is a closed (k— l)-form; therefore, by the Poincare lemma 
it is locally exact. □ 

The Poincare lemma says that a closed form will be exact if its domain is suffi¬ 
ciently simple. The closed 1-form 

—ydx + xdy 


whose domain is the punctured plane IP = R 2 \ (0,0), shows that exactness may be 
lost if the domain is even slightly complicated. This example is not isolated; there is 
an analogue of ft) in every dimension. We explore them now to get a better idea how 
the shape of a domain can become an obstruction to the exactness of a closed form. 

We take the domain to be punctured 3-space Q = R 3 \ (0,0,0), and define the 
2-form 

P = Xdydz+Ydzdx + Zdxdy 
by 


X = 


Y = 


Z = 


(x 2 +y 2 + z 2 ) 3 / 2 ’ (x 2 +y 2 + z 2 ) 3 / 2 ’ (x 2 +y 2 + z 2 ) 3 / 2 


Because 


ja ,'dX dY dZ\ J J , 
dp = ——|- ——|- — dxdydz 

ox dv dz 


and 


dX (x 2 Xy 1 +z 2 ) i l 2 — x-3x(x 1 +y 2 -l-c 2 ) 1 / 2 x 2 +y 2 +z 2 — 3x 2 

dx (x 2 +y 2 +z 2 ) 3 (x 2 +y 2 +z 2 ) 5 / 2 ’ 

dY _x 2 +y 2 +z 2 -3y 2 dZ _ x 2 +y 2 +z 2 -3z 2 
dy (x 2 +y 2 +z 2 ) 5 / 2 ’ dz (x 2 +y 2 +z 2 ) 5 / 2 ’ 

we see dp =0 everywhere on Q, 

If P were exact, so that /3 = da for some 1-form a defined on Cf then we would 
have 



where S 2 is the outwardly oriented unit sphere in R 3 . The path integral equals zero 
because dS 2 is empty. However, because x 2 +y 2 +z 2 = 1 on 5 2 , /3 reduces to radial 
flow out of the sphere (cf. p. 396), so 
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JM 


xdydz+ydzdx + zdxdy = 4n ^ 0. 


Consequently, f) is not exact on Q, 

The 2-fotm /3 is the prototype for the following sequence of examples, one in 
each dimension. Let = R" \ 0, and let 


A.-tW = X(-i) 


-dx 


5=1 


N s > 


wherex= ... ,x n ), r = ||x||,iV = (1,...,«), and N s = (1 Note that 


X2dx\—x\dxi x\dx2dx-i,—X2dx\dxT l -\-XT ! dx\dx-) 

Pi = -s—- 5 —- and p 2 =- 


Xj 


(x 3 +X 3 +X 3 ) 3 / 2 

so the (n — l)-form /3„_i generalizes to and f). We have 


<%-> = £ £ ©) =i =o 


so /3„_ i is closed. If /!„ i were exact, then we would have jX, i = da n -2 for some 
(n — 2)-form defined on Q il . Let be the unit (n — l)-sphere in R", oriented by 
its outward normal; as a set, 1 consists of all points x in R" for which r = 1. Then 
we would have 

If ■ 'L-x Pn -' = If ■ f -. dan ~ 2= f ■ '!*-* a "~ 2=0j 

because = 0. Nevertheless, /3 is not exact, because 

This follows immediately from the general n-dimensional Stokes’ theorem (which 
we do not prove). In dimension rc = 4, however, you can prove by a direct computa¬ 
tion (cf. Exercise 11.13) that 

JIi k=2n2 - 


Let us resume our analysis of Q= R 3 \ (0,0,0). We distinguish between two 
kinds of 2-spheres in Q; those that enclose the origin, and those that do not. Each 
oriented sphere Si of the first kind is the boundary of an oriented ball Bi that contains 
the origin. But the origin is not in Q, so, although Si is the boundary of a ball in 
R 3 , it is not the boundary of a ball in Q. By contrast, each oriented sphere Sn of 
the second kind is the boundary of an oriented ball Bu that lies entirely in Q, For 
spheres of the first kind, we have the following result. 


... is not exact 

An analogue of 
in each dimension 


The two kinds 
of spheres in Q, 





506 


11 Stokes’ Theorem 



Differential forms 
that detect holes 


Every closed 1-form 
on Q, is exact 


Theorem 11.22. Suppose the origin lies in the interior of the positively oriented ball 
B i in R 3 . Let Si = dBi in R 3 ; then 

Proof. We show that E(Si) = E(U), where U is the unit sphere (centered at the 
origin) with its outward orientation; we have already established (p. 504) that 
E(U) = +l. 

Suppose the given sphere, Si, lies everywhere outside U, and suppose that A is 
the 3-dimensional positively oriented shell that lies between the two spheres; then 
dA=Si— U. The divergence theorem applies to /3 on A; we thus find 

= fJJf-Ufj - Ufj - 

because j3 is closed on Q,. 

Even if Si does not lie entirely outside the unit sphere U, some concentric en¬ 
largement XSi does. If XA is the positively oriented 3-dimensional shell that lies 
between these concentric spheres, then d{XA) = AS/ — Si. Just as in the previous 
case, the divergence theorem applies, giving E (A Si) — E(Si) = 0. Thus, in all cases, 
E(Si)=E(U) = + 1. □ 

Corollary 11.23 If the sphere Si encloses the origin and has inward orientation, 
then E(Si) = — 1. □ 

Theorem 11.24. IfBu is an oriented ball that lies entirely in (f und Sn = A Bn, then 
E(Sn ) = 0. 

Proof. Because /3 is defined everywhere in an open neighborhood of Bn, the diver¬ 
gence theorem applies, and 



We see that the function E (S) plays the same role for oriented spheres in Q that 
the winding number 


W(C) 


1 

2 n 



181 


xdy—ydx 
x 2 +y 2 


(p. 430) plays for oriented circles C in the punctured plane. That is, each domain has 
a hole, and a nonzero value of the function indicates that the sphere or circle encloses 
that hole. The function is, in each case, determined by the differential form; thus we 
can just as well say it is the differential form that detects the hole. 

In the case of the 3-dimensional region Q,, it is a 2-form that detects the hole. Is 
there a 1-form on Q, that does the same thing? In other words, is there a 1-form a 
that is closed on Q,but fails to be exact on Q? 
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Theorem 11.25. Every closed l-form a on Qis exact. 

Proof. We construct a function 0 (x.y, z) for which d0 = a on Q. Select a smooth 
path C in Q, that starts at a fixed point (a, b , c) in Q and ends at an arbitrary point 
(x,y,z) in Q. Define 



for this to be meaningful, we must show the integral is path-independent. 

Let Ci be any other smooth path in Q with the same starting and ending points 
as C. Then Ci — C is a closed piecewise-smooth oriented path in Q, and is therefore 
the boundary of an oriented surface X that can be chosen to avoid the origin. In other 
words, X, lies entirely in Q, so Stokes’ theorem applies. Thus 

[ a- f a = [ a= [ a= [[ da = 0, 

Jci JC Jci-C Jdz JJz 

because da = 0 by hypothesis, so the integral is path-independent. □ 

Because the exterior derivative on forms corresponds to the classical operations 
of the gradient, divergence, and curl on scalar and vector fields, we can translate 
the relations between closed and exact forms into relations between these opera¬ 
tions. On pages 460-462, we made the following correspondence between fields 
and forms in M 3 . 


/ 00 f ~ f' 

F = (A,B,C) <-> (tip =Adx + Bdy + Cdz , 

V = ( P,Q,R ) <-> CDy = Pdydz + Qdzdx + Rdxdy. 

H <—> cofi = H dxdydz. 

Using the differential operator V (i.e., nabla, Definition 3.3, p. 93) to express the 
classical operators, 

grad/ = V/, curlF = Vx F, divV = V- V, 

we can express the correspondences between those operators and the exterior deriva¬ 
tive in the following way. 

grad/: V/'= (./,./,./) <-► d(coj) = f x dx + f y dy + f z dz 

curlF: V x (A,B,C) d((D^) = (C v —B : ) dydz+(A : — C x ) dzdx 

+ (B x — Ay) dxdy 

divV : V • (P,Q,R) d(cOy) = (P x + Q y + Rf) dxdydz 

As already noted (pp. 460^162), these correspondences define the forms 
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®grad/ = t/ (®?) =fxdx+f y dy + f z dz 

®cui-iF = d (ttJp) = (C v — B : ) dydz+ (A z — C x ) dzdx+ (. B x — A y ) dxdy 

^divV = d (dty) = (P x +Qy-\-R z ) dxdydz. 

Theorem 11.26. Suppose f is a scalar field, and F a vector field, and each is defined 
on an open set in R 3 ; then 

V x V/ = curlgrad/ = 0 and V- Vx F = divcurlF = 0. 

Proof. These are just translations of d 2 = 0. □ 

The Poincare lemma itself translates into the following pair of theorems. 

Theorem 11.27. Suppose F is a vector field and curlF = 0 in a neighborhood of 
some point in M 3 ; then there is a scalar field 0 for which F = Vd> = gradd> in a 
window centered at that point. □ 

Theorem 11.28. Suppose Y is a vector field and divY = 0 in a neighborhood of 
some point in R 3 ; then there is another vector field P for which 

V = VxP = curlP 


Vector and 
scalar potentials 


Irrotational and 
incompressible flows 


The Laplacian 


in a window centered at that point. □ 

The function 0 in Theorem 11.27 is a potential function of the vector field F 
(Definition 1.3, p. 25). Because the vector field P in Theorem 11.28 stands in the 
same relation to the field V, we call P a vector potential for Y. (For the sake of 
clarity, we now refer to the function / as a scalar potential for the field F of Theo¬ 
rem 11.27.) By extension, we call a a vector potential for any Ar-form CO whenever 
da = ft). The Poincare lemma thus asserts the existence of a local vector potential 
for any closed k- form; indeed, Yap’s result [22] is expressed in this language. By 
Corollary 11.21, a vector potential is not unique (when it exists). 

Suppose a fluid flow is represented by a continuously differentiable vector 
field Y. We say the flow is irrotational if curlY = V x V = 0; we say it is in¬ 
compressible if divY = V- V = 0. In these terms the previous theorems say the 
following. 

• A gradient flow is irrotational. 

• A vortex flow (Definition 11.4, p. 473) is incompressible. 

• An irrotational flow is locally a gradient. 

• An incompressible flow is locally a vortex flow. 

There are meaningful ways to compose a pair of classical operators that do not 
correspond to the composition d 2 . One is divgrad/ = V- V/, where / = f(x,y,z) is 
a scalar field. We have 
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V/ = 


dfdfdf 

dx : dy' dz 


and V • V/ = 


d\f 

dx 2 


dy 2 


The second-order differential operator 


d lL 

dz 2 


V-V= — + — + — 

dx 2 dy 2 dz 2 


that appears here is called the Laplacian; it is also denoted as A: 


. , d 2 f d 2 f 
f dx 2 + dy 2 ' 


ll 

dz 2 


The Laplacian operates on a scalar field (i.e., a function) to produce another scalar 
field. We can extend the Laplacian to vector fields by operating on each component. 
IfF = (A,B,C), just set 

AF = (AA,AB,AC), 

another vector field. 

Two more classical composites that are similarly unrelated to d 2 are 


graddivV = V(V-V) and curl(curlF) = Vx (Vx F). 


Each composite operates on a vector field to produce another vector field; the two 
are connected by the following identity (see Exercise 11.28): 


curl(curlF) = grad(divF) — div(gradF) 
Vx (Vx F) = V(V-F) — AF. 


Exercises 

11.1. Calculate div V = V ■ V when: 

a. V = (xcosy,xsiny,0). 

b. V = ( y+z,z+x,x+y ). 

c. V = (x/yz,y/zx,z/xy). 

d. V = grad/, f(x,y,z) =ax + by + cz. 

e. V = grad/, f(x,y,z) arbitrary. 

11.2. Calculate V x F = curlF when 

a. F = (vz,zx,xy). 

b. F = (y + z,z+x,x+y). 

c. F = grad/, f(x,y,z) =ax+by + cz. 

d. F = grad (p, (p(x,y,z) = ax 2 +2bxy + cy 2 + 2dyz + ez 2 + 2fzx. 


510 


e. F = (y 2 ,z 2 ,x 2 ). 

f. F =(z,x,y). 

g. F = grad H, H(x,y,z) arbitrary. 
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11.3. Calculate the circulation <j> F • ds of the field F = (z,0,0) around the closed 

oriented path C where: 

a. C: (x,y,z) = (cost, sint, j cost), 0 < t < 2n. 

b. C : ( x,y,z ) = (0,3sint,3cost), 0 <t< 2n. 

c. C is the unit circle in the (x,y)-plane, traversed counterclockwise when 
viewed from the positive z-axis. 

d. C is the circle in plane z = 2 of radius r centered at the point (p,q, 2), 
traversed clockwise when viewed from a position where z > 2. 

e. C is the circle in plane y = 2 of radius r centered at the point ( p,2,q ), 
traversed clockwise when viewed from a position where y <2. 

f. C is the rectangle with vertices (0,0,0), (1,1,0), (0,2,1), (—1,1,1), tra¬ 
versed in that order. 

11.4. For each curve C in Exercise 11.3, let R be the plane region whose boundary 

is C: dR = C. Now compute the surface integral 


JJ( curlF ■ 


n) dA. 


(You might wish to rewrite this in a form that is more amenable to computa¬ 
tion.) 

By (the original version of) Stokes’ theorem, this surface integral should 
have the same value as the path integral that gave the circulation in the cor¬ 
responding problem. Do your values agree? 

11.5. Calculate the flux of V = (x,y,z) out of the positively oriented sphere S of 
radius R centered at the origin ( x,y,z ) = (0,0,0) in two ways: 

a. First, directly as jjv ■ n dA. 

b. Second, by calculating the divergence of V over the interior of S and using 
the divergence theorem. State the theorem and indicate how you are using 
it. 


Do the two values agree? 

11.6. Let P be the positively oriented rectangular parallelepiped .P in (x,y,z)-space 
with 0<x<5,0<y<3,0<z<2. Calculate the flux ofV = (—y,x, 0) out 
of P in two ways: 


First, directly as // V-n dA. 
JJdP 
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b. Second, by calculating the divergence of ¥ over P and using the diver¬ 
gence theorem. 

Do the two values agree? 

11.7. Suppose S is a closed surface in R 3 , and the vector x points from the origin 
to an arbitrary point on S. Show that 

^ JJ (x • n) dA = vol(5), 

where n is the outward unit normal to S at x. 

11.8. Prove the divergence theorem for the positively oriented unit tetrahedron B 
(Theorem 11.3) in the form 

[ffda= jfa, 

JjJB JJdB 

where a = Pdydz+ Qdzdx + Rdxdy. 

11.9. Use the divergence theorem to calculate the flux of ¥ = (x.y.z) out of the 
sphere S of radius R in (x.y.z)-space. Compare your answer with an earlier 
calculation of the same quantity (Exercise 10.4, p. 444). 

11.10. Use the divergence theorem to calculate the flux of ¥ = (—y,x, 0) out of the 
rectangular parallelepiped P in (x, v,z)-space given by 


0<x<5, 0<y<3, 0<z<2. 


Compare your answer with an earlier calculation of the same quantity (Ex¬ 
ercise 10.5, p. 444). 

11.11. Show that if two bilinear forms A (v, w) and B(y, w) defined on R" agree on 
a basis for R", they agree everywhere. 

11.12. Suppose there is a continuously differentiable, single-valued function 0(x,y) 
defined everywhere on the “punctured plane” 0 : R 2 \ (0,0) for which 

d& = (O = (xdy — ydx)/(x 2 +y 2 ). 


Show, in the following steps, that this assumption leads to a contradiction. 

a. Let (p(t) = 0(cost,sint). Show that (p'(t) = 1 for all t. 

b. Deduce that (p(t) =? + <f>(l,0) and then that <&( 1,0) =2^ + 0(l,O).The 
contradiction is then 0 = 2n. 


11.13. Show that 


JjJ^ 3 ft = 2?t 2 (cf. p. 505). The following provides one approach. 
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a. Parametrize S 3 with the “spherical coordinates” map x = s(1. / 1 , ?2 - ^ 3 ) 
(Exercise 5.25, p. 183) and show that s defines the positive (outward nor¬ 
mal) orientation on .S' 3 . 

b. Show that s* (ft) = cost 2 cos 2 tj ,; then integrate s* (ft) over the appropriat 
domain to obtain the result. 

11.14. Show that F = (2x + _ycosxv,xcosxv,2z 2 ) has a scalar potential and find it. 

11.15. DoesF = (2x+ycosxy,xcosxy,2r 2 ) have a vector potential? If so, find it; if 
not, explain why not. 

11.16. a. Does V = (y+z,z + x,x+y) have a scalar potential? If so, find it; if not, 

explain why not. 

b. Show that V has a vector potential and find it. 

c. Find another vector potential for V that has the form (0 ,B,C), for suit¬ 
able functions B(x,y : z) and C(x,_v,z). Suggestion: Take your solution F = 
(P,Q,R) to part (a) and construct a function / for which df/dx = —P. 
Then F + grad/ will solve the problem. Explain why. 

11.17. Show that the following system of partial differential equations has a so¬ 
lution (consisting of three functions P{x,y,z), Q(x,y,z), and R(x,y,z)), and 
find the solution. 

dR dQ dP dR dQ dP 

^~^ =yz ' Tz~Tx =zx ' AT-57 = " v • 

Explain how this question is connected with divergence, gradient, and curl. 

11.18. Explain why the following system of partial differential equations has no 
solution f(x,y,z). 


df df df 

-^- = -y+ Xl —=x+y, - 5 -=z. 

dx dy dz 

Explain how this question is connected with divergence, gradient, and curl. 

11.19. Explain why the following system of partial differential equations has no 
solutions P(x,y,z), Q(x,y,z), and R(x,y,z). 


dR dQ 


dP dR 


_ dQ dP 
dy dz 1 dz dx dx dy 

Explain how this question is connected with divergence, gradient, and curl. 

11.20. Let ft)(x,y, u, v) be the 2-form in R 4 defined by 

ft) =Adxdy + Bdydu + Cdudv + Ddvdx + Edxdu + Fdydv. 


a. Show that if to is closed (i.e., dco = 0), then the coefficients of to must 
satisfy the following partial differential equations: 
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A u + B x — Ey — 0 C x + D u + E v — 0 

B v + Cy -F u = 0 Dy + Ay + F x = 0 

b. Suppose co is exact; then co = da for some 1-form a = Pdx + Qdy - 
Rdu + Sdv. Show that the coefficients P,Q,R, and S of a must satisfy the 
following partial differential equations in terms of the known coefficients 
A,..., F of co. 

Qx-Py=A R y -Q U =B S U — R V = C 
P V -S X = D R X -P U =E Sy~Qy=F 

11.21. Let co = A dxdy + B dydz + Cdzdx be a 2-form in R 3 . 

a. Assume that co is closed. Determine the partial differential equation (there 
is only one) that the coefficients A, B, and C must satisfy. 

b. Assume that co is exact, with co = da, where a = Pdx + Qdy + Rdz. 
Determine the conditions that the coefficients P, Q, and R of a must 
satisfy. (The conditions are three partial differential equations for P, Q, 
and R in terms of the given A, B, and C.) 

11.22. Let (3 = Pdx+ Qdy + Rdzbe a 1-forminR 3 . 

a. Assume that ft is closed. Determine the conditions that the coefficients P, 
Q, and R must satisfy. 

b. Assume that /3 is exact, so /3 = df for some function f(x,y,z ) on R 3 . 
Determine the conditions that / must satisfy. 

11.23. Suppose the 1-form co =Pdx + Qdy+ dz is closed in a window W centered 
at the point (x,y,z) = ( a,b,c ). By analogy with Theorem 11.18, the exterior 
derivative of 0-form 

rx ry rz 

0(x,y,z) = / P(t,b,c)dt + / Q(x,t,c)dt + / R(x,y,t)dt 
Ja Jb Jc 

should equal CO in W. Write down the integrability conditions defined by 
dco = 0; then use those conditions to establish d> x = P, <b, Q, & z = R, and 
hence to prove that d<P = co. 

11.24. Extend the result of the previous exercise to n dimensions, as follows. Sup¬ 
pose the 1-form co = P\ dx\ H- P n dx„ is closed in a window W centered at 

a. Write down the integrability conditions described by dco = 0. 

b. Express the 0-form 0 as a sum of integrals of the various coefficients 
Pi of co. The expression must reduce to the one in the previous exercise 
(mutatis mutandis) if n = 3. 

c. Use the integrability conditions to establish dd>/dxi = Pi, i = 1 
and hence to prove that d@ = co. 
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11.25. Verify the statements in the text (p. 497) about the number of partial differ¬ 
ential equations and integrability conditions that are involved in establishing 
local exactness of a closed form. 


11.26. Show that 

n — 1\ ( n ~ 1\ / n \ 

k-l) + \k-2j = U-l/ 

(Note: This yields “Pascal’s triangle.”) 


if n > k. 


11.27. The algorithm for the general Poincare lemma (i.e., for a £-form in n vari¬ 
ables) involves ( A ."j) integrability conditions. 


a. Show that, in carrying out the induction step from m — 1 variables to m 
variables, the algorithm uses only ( m A l ) of those integrability conditions. 
Show, moreover, that different integrability conditions are used for each 
distinct value of m. Thus, as m successively takes the values k+ 1,... 
a total of 

i (V 

m=k +1 V A 


b. 


integrability conditions are used. 

Show that all integrability conditions are used in the algorithm by show¬ 
ing that 



11.28. a. Express curl(curlF) = V x (V x F) in terms of the components of F = 
(A,B,C) and their derivatives. 

b. Interpret the identity U x (V x W) = (U • W)V — (U ■ V)W in a suitable 
way for V x (V x F) to show that 


curl(curlF) = grad(divF) — div(gradF) = V(V- F) — (V- V)F. 


11.29. Use the divergence theorem to prove that 

£{V/-Vg + /(V. Vg)}dV = dA > 

where R is a region in R 3 , n is the unit normal outward on the surface dR, 
and / and g are smooth functions defined on R. 
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Symbols 

|| Q\\ mesh size of the grid Q 291 

( A ) binomial coefficient (“. k choose j”) 90, 

' 439 

( k ) multinomial coefficient 94 

111L111 norm of the linear map L 104 
( , ) integral (symbolic) pairing 427 
|xj , |Y| floor, ceiling of x 283 

v A w oriented parallelogram determined by v 
and w 41 

x A y A z oriented parallelepiped determined by 
x, y, and z 42 

V differential operator (nabla) 74, 93 

V • divergence operator 461 

V x curl operator 462 

(Of, COy differential form corresponding to 
function /, vector field V 460, 507 
© addition of velocities in special relativity 
217 

- closure operator (overline), when used to 
denote the closure S’ of a set S 277 
d boundary operator 277, 427^129 
d(f,g)/d(u,v) Jacobian of f,g with respect 
to u, v 137 

^lf(x,y) = dxf(x, y ) partial derivative of f with 
respect to its 1st (i.e., x) variables 206 
(circumflex) when used to delete an item 
from a list 197,441,500 
* when used to indicate the pullback of a 
differential form 431—437 
° interior operator (°S is the interior of S ) 

277 

1 transpose operator ( M ^ is the transpose of 
the matrix M) 42 
\ set difference 286 


x cross-product 42 
Ax•V 93 

indicates an oriented object, e.g., C 7,353, 
388,392,428 
A wedge product 

of differential forms 423 
of vectors 41,46 

A 

A,A inner, outer area 294 
acceleration 

gravitational acceleration 269 
addition of velocities (special relativity) 217 
additivity 

of integrals 298 
of Jordan content 286 
of work on displacements 6 
adjoints (in a symbolic pairing) 428, 436 
agree at least to order p 
at an arbitrary point 90 
at the origin 88 
angular speed (scalar) 462 
angular velocity vector 462 
anticommutative 423 
antisymmetric 
form 62,63 
matrix 243,470 
arc length 14 

element of arc length 14, 408 
arccosh (inverse hyperbolic function) 151 

arcsech (inverse hyperbolic function) 177 

arcsinh (inverse hyperbolic function) 153 

arctangent function (with two arguments) 

59,494 
graph 430 

arctanh (inverse hyperbolic function) 177 
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Index 


area 

as Jordan content 294 
element of area 272 
element of oriented area 356 
signed area 41,356 
arrowhead 384 
average value see mean value 

B 

B r ball of radius r 167 

Babylonian algorithm 157 
ball (of radius r in R") 167 
basic k-form 423 

BASIC program see computer program 
basis vector 31 

“best-fitting” see Taylor polynomial 
big oh 87 
bilinear (form) 62 
boundary (of a set) 277 
boundary operator 277, 427 

adjoint to exterior derivative 428 
boundary point 277 
Bourton-on-the-Water 166 
branch (of a function) 152, 154, 156 
Buck, R. C. vii 

C 

c complementation operator (S° is the 
complement of the set S) 277 
Cauchy sequence 168 
ceiling (of a real number) 283 
chain rule 132,129-140 

for component functions 136 
for Jacobians 138 
change of variables 1-5,21 
for a differential form 429—443 
for double integrals 339-352 
oriented version 357 , 358-363 
unoriented version 339-350, 350-352 
with a push-forward 341 
with Green’s theorem 370 
for integral pairings 437 
for single integrals 337-339 
oriented version 337 
unoriented version 338-339 
for triple integrals 
oriented, unoriented versions 363 
orientation-reversing 338-339 
characteristic 
equation 35 
repeated root 38 
polynominal 35 


value, vector 35 
circulation 466 , 464^173 
in Stokes’ theorem 485 
per unit area 491 
closed disk 276 
closed form 494 

locally exact closed form 496 
closed set 277 
closure (of a set) 277 
co-latitude 446 
codim (codimension) 51,211 
codimension 51 

of an embedded surface patch 211 
collapse 39 
common refinement 300 
commutative diagram 131,163 
completing the square, in Morse’s lemma 
234,256 

complex conjugate 245 

composite (of two maps) 131, see also factor 

computer program 

Mathemadca 108,122, 164 
to compute a gravitational field 271 
conformal map 118, 165, 359 
conjugate transpose 267 
content see also Jordan content 
for an arbitrary collection of grids 
288-294 

contraction mapping 167 

contraction mapping principle 167-169 
convergence 

of an improper integral 331-336 
of an infinite series 331 
coordinate change 158-165, see also change 
of variables 

as translation dictionary 192 
can reverse concavity 222-223 
cylindrical 178 
in a path integral 434 
in a surface integral 434 
in Morse’s lemma 234-239, 253-257 
in surface parametrizations 399—401 

linear 32 
polar 112 
spherical 178,446 
to produce a pure square 219-224 
to straighten level sets 189-191, 193, 195, 
201-203 

to transform a quadratic form 226, 
240-242,244 
coordinate function 192 
coordinate grid 29-39 
coordinates 32 
as languages 192 
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biangular 178 
bipolar (two-center) 178 
curvilinear 165 
cylindrical 178 
spherical 178,446 
inR 4 183 
window 113 

cosh (hyperbolic function) 22 
Courant, R. xii 
critical point 189 

degenerate, nondegenerate 221,228, 246 
function of one variable 219-224 
Morse’s lemma 221 
second derivative test 221 
function of several variables 243-264 
Morse’s lemma 248-263 
second derivative test 263 
function of two variables 224-243 
Morse’s lemma 228,233-237 
second derivative test 229 
Hessian, Hessian form 227, 246 
index (index of inertia) 263 
isolated, nonisolated 225,229,264 
cross-product 42 
crosscap 125-128 

fails to be an immersion 215 
CUPM (The Committee on the Undergraduate 
Program in Mathematics) xii 
curl 461 

as cross product with nabla 462 
in Stokes’ theorem 485 
physical meaning 462^482 
curve 

closed 9 
oriented 7 
parametrized 7 

partition (respecting the orientation) 8 
piecewise-smooth 9 
simple 7 
smooth 7 

space curve as a locus 199 
unoriented 17-20 
curve integral see path integral 
cusp 376 

Cyrillic alphabet 427 


D(f,S) Darboux integral of/ over S 301 
D, D lower, upper Darboux integrals 300 
D r , D n lower, upper Darboux sums over the 
^d£ 300 

D u directional derivative in the direction u 
109 


d derivative operator 99, 129 

d boundary operator 277,427^429 
d exterior derivative of a differential form 
425 

Darboux integral 301 , 299-310 
equals Riemann integral 301-304 
degenerate 

critical point 221,228, 246 
quadratic form 227, 245 
derivative 129 

as linear part of Taylor polynomial 99 
as matrix of partial derivatives 99 
directional derivative 109 
of a set function 312, 362 
of the inverse map 136 

partial derivative 107,206 
product rule 130 
det (determinant) 35, 66 
determinant 36,66,61-66 
diagram see commutative diagram 
diameter (of a set) 291 
Dieudonne, J. xi 
difference see set difference 
differentiability viii, 105-111, 129 
continuous 174-175 
differentiable 
function 106 
map 115,129 

differential see exterior derivative 
differential form 423^443 
closed see closed form 
correspondence with a (scalar or vector) 
field 460,507 
exact see exact form 
in n variables 439—443 
locally exact see closed form 
dilation see uniform dilation 
dimension 

of an embedded surface patch 209 
directional derivative 109 

expressed using the gradient 110 
displacement 6 
div (divergence) 451 
divergence 451 , 449^159 
as dot product with nabla 461 
physical meaning 449—451, 479—482 
divergence theorem 456 , 452—459 
expressed with integral pairings 456 
double cover 117 
double integral 295 

absolute (unoriented) 296 
oriented 353, 356 
double sum 297 
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E 

e base of natural logarithms 20 
eigenspace (associated with an eigenvalue) 
262 

eigenvalue 35 

complex eigenvalue 36 
multiplicity of an eigenvalue 261 
of a quadratic form 241-243 
of a symmetric matrix 245 
of the Hessian 228-229 
eigenvector 35 

of a quadratic form 241-243 
element 

of arc length 14,408 

of area 272 
in polar coordinates 276 
oriented (signed) 356 
of surface area 404,408 
embedded surface patch 209 
codimension 211 
dimension 209 
embedding 210 
equivalence class 33, 40 
equivalent matrices 33 
eigenvalues of 36 
Euclidean motion 288 
exact differential equation 493 
exact form 494 
exterior derivative 425 

adjoint to boundary operator 428 
correspondence with differential operators 
507 

product rule 440 
exterior form see differential form 
exterior point 277 
exterior product 423 

F 

factor (one map factoring through another) 
163-164 

fails to vanish see vanish 
Feynman Lectures xii, 411 
Feynman, R. xii 

first integral (of a differential equation) 493 
fixed point (of a map) 156-158, 166, 
167-169 

floor (of a real number) 283 
flow see flux; see also total flux 
flux 387 
steady flow 388 

through a curved surface 392—401 
total see total flux 


fold map 146, 373-374 
folium of Descartes 237-240 
force 

and work 6 
gravitational force 269 
Foy, R. xiii 

fundamental theorem of calculus 428 
expressed with integral pairings 429 

G 

G content, G-measurability 291 
G, G inner, outer G content 291,294 
G g, Gg inner, outer ./-content estimates 293 
Gf,. Ga inner, outer ./-content estimates 291 
Q k general (nested) grid 290 
{Q} general (unnested) integration grids 
293-294 

Gauss’s theorem see divergence theorem 

general position 200 

generator (of a surface of revolution) 148 

Gleason, A. viii, xii 

grad (gradient) 25, 75 

gradient 25,74 

as scalar product with nabla 461 
connection with directional derivative 110 
gradient vector 110 
graph 48 

difficulty visualizing 112 
implicitly defined 185,208 
semi-log graph paper 160 
gravitational acceleration 269 
gravitational field 269-276 
of a hollow sphere 409—412 

via iterated integrals 323-325 
greatest lower bound 300 
Green’s theorem 364-377 
expressed with integral pairings 427 
in terms of differentials 427 
in the change of variables formula 370 
on more general domains 368 
gutter 225 

H 

H content, //-measurability 288 
//, H inner, outer H content 288 
H k ,H k inner, outer area estimates 288 
Jf k grid 288 
harmonic function 384 
Hessian, Hessian form 227, 246 
hyperbolic functions 23 
inverses 151-153 
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I 

i = ^TT 35 

im (image of a linear map) 46 
image grid 29-39, 113-119, 122-128, 
160-161, 164-165 
image of a map viii, 29,46, 112 
immersion 212-215 
implicit function theorem ix-xi, 189, 193, 
195,203,207 
implicit functions 

given by a single equation 185-198 
given by linear equations 47-52 
given by several equations 205-215 
given by two equations 198-205 
improper integral 273, 326-337 
compared with infinite series 330-331 
convergence 327, 334,337 
unbounded domain 329, 336-337 
unbounded function on bounded domain 
328, 329-336 
incompressible 508 

index (of a nondegenerate critical point) 263 
index (of a quadratic form) 258 
index of inertia (Sylvester’s) 262 
induced orientation 392 
induction proof 79,255 
inf (infimum) 300 
initial-value problem 154 
injection 56,212-215 
inner Jordan content 281 
inner volume 309 
integrability condition 495 
integrable function 295,301 
integral 

as a symbolic pairing 427 
as “product” 272 
as Riemann-Darboux integral 304 
Darboux see Darboux integral 
improper see improper integral 
iterated see iterated integral 
path see path integral 
proper 326 

Riemann see Riemann integral 
surface see surface integral 
0-dimensional oriented integral 429 
integral pairing 

of a region and a differential form 427 
to express Green’s theorem 427 
to express the divergence theorem 456 
to express the Fundamental Theorem of 
Calculus 429 

integral type see set function 
integrating factor 495 


integration by parts 80 
integration grids 293-294, 295 
interior (of a set) 277 
interior point 276 
invariant line 29, 35 
inverse 

in solving a differential equation 153 
of a map 154-156 
inverse function theorem ix-xi, 169, 
165-176 

compared with Taylor’s theorem 176 
iiTotational 508 
iterate 158 

iterated integral 317-326 

J 

J Jordan content 281 

Jf Jacobian of the map f 137 

J_ k , Jk inner, outer area estimates 281 

Jk Jordan content grid 280 

J, J inner, outer Jordan content 281 

Jacobian 137 

as derivative of a set function 362 
as local area magnification factor 343, 
352,362-363 

in change of variables formula 339-340 
its effect on orientation 355, 362-363 
Jacobian matrix 137 
of the inverse map 138 
Jakus, S. xii 
John, F. xii 

Jordan content 269, 281, 278-294 
and ordinary area 282-283, 290, 292 
in higher dimensions 294, 308-310 
Jordan measurability (of a set) 281 

K 

&-form see differential form 
^-parallelepiped in R" (k < n) 67 
^-volume in R" (k < n) 69 

Kaplan, W. vii 
ker (kernel of a linear map) 46 
kernel 35,46 

L 

Lagrange’s form of the remainder 82, 94 

Lang, S. xi, 169 

Laplacian 509 

latitude, longitude 122, 447 
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law of the mean 71-76, see also mean-value 
theorem 
basic 71 

compared to the mean-value theorem 72 
for double integrals 76 
for functions on R” 75 

for single integrals 72 
generalized 73 
Leach, K. xiii 
least upper bound 300 
Leibniz notation 131 
line integral see path integral 
linearization of the locus 187, 194 
little oh 86 

local linearity viii-x, see also differentiabil¬ 
ity 

as shown by level curves 110-111 
as shown by the image of a map 112-121 
locally exact see closed form 
locally linear 
function 4, 106 
map 115,120,129 
locus 47,185 

as graph of a function or map 185,208 

as level set or contour 189 
look linear locally viii-ix, 116-121, 

163-165, 175-176, 197-198,205,209, 
213 

contrasted with local linearity 119-121, 
128 

lower Darboux integral, sum 300 


magnification factor see multiplier 

manta ray 108 

map 

log-log 161 

of the plane to itself 112-121 
linear 29—42 
semi-log 161 

map singularity see singular point 
mass 

gravitational mass 269 
of a plane region 270,310 
of a wire 18 

mass density 18,270, 387 

as limit of average mass density 310 
Mathematica viii, xi, 108, 122, 164 
matrix 

antisymmetric matrix 243, 470 
conjugate transpose 267 
of a quadratic form 226 
symmetric matrix 60, 470 


transpose 42,226 
mean (of a random variable) 20 
mean value 73,75 

mean-value theorem 71-77, see also law of 
the mean 
basic 72 

for functions on R" 75 
for maps 139-140 
for vector-valued functions 76 
measurability see also Jordan measurability 
for an arbitrary collection of grids 
288-294 

mesh of a partition 8 
mesh size (of a grid) 291,295-297 
microscope equation 3, 115 
generalized 83,95 
microscope window 111, 113-115, 

117-119, 122-125 
minimax 224 
Mobius strip 415 
model village 166 
moment 316 
Morrey, C. xii 

Morse’s lemma x, xi, 221,228, 248, 
243-264 

applied 233-237 

compared with Taylor’s theorem 248-249 
Morse, M. 248 
multi-index 439^143,499-503 
multilinear (form) 63 
multinomial 

coefficients 94 
expansion of (Ax • V)* 94 

multiple-valued (function) 430 
multiplicity (of an eigenvalue) 261 
multiplier 3 

area multiplier 41,42, 115, 125,292,343, 
362 

for Jordan content 292 

length multiplier 29 

local multiplier 4, 115,125,137, 343, 362 

volume multiplier 44,137 

N 

^-parallelepiped 46 
volume of 46 
n-volume 46 
nabla (“V”) 74,93,461 

N ewton-Raphson method 157 
nondegenerate 

critical point 221,228,246 
quadratic form 227, 245 
nonoverlapping sets 285 
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norm 

of a derivative 139 

of a linear map 104 
normal 

determines orientation 388 
orienting normal of a surface parametriza- 
tion 393 

orienting unit normal 390 
to a plane region in space 388 
normal density function 21,311 
noimal distribution 20, 311 
null space 46 
nullity 47 

O 

O (“big oh”) 87 

Opxk the zero matrix with p rows and 
k columns 49 
o (“little oh”) 86 
0(p) (“order at least p”) 87 

in Taylor’s fonnula 87 
o(p) (“order greater than p ”) 86 

in defining differentiability 105, 129 
open disk 276 
open set 277 

order of integration 321,357 
order of vanishing xii, 85-90, 95, 98, see 
also vanish 
orientation 

and sense of rotation 353 
and the Jacobian 355, 362-363 
detennined by normal 388 
induced 

by a differentiable map 355 
by a linear map 30, 41, 44, 353 
on the boundary 355,389 
induced orientation 392 
of a curve 7 

of a displacement 6 

of a parallelogam 41 
of a piecewise-smooth surface 420 
of a plane 30 

of a region in the plane 353-355 

of a smooth surface 417 
of a 0-dimensional region 428 
of an ordered set of vectors 45, 353-355 
positive, negative 41,42,45,353, 354 
preserving, reversing 137 
oriented surface integral see surface integral 
oriented surface patch 392 
orienting normal of a surface parametrization 
393 

orthogonal matrix 259 


outer Jordan content 281 

outer volume 309 

overlap (overlapping sets) 285 

P 

Pn,a Taylor polynomial of degree n centered 
at a 77 

pairing see integral pairing 
parallelepiped see also ^-parallelepiped and 
k-parallelepiped in R" 
oriented parallelepiped 42 
volume of 43 
parallelogram 
area of 41 
in R 3 44 

oriented parallelogram 41 
orienting unit normal 390 
parameter 7 

arc-length parameter 15 
parametrization 

arc-length parametrization 15 
of a curve 7 
of a surface 121 
of the crosscap 125 
of the unit sphere 121 

unit-speed parametrization 17 
parametrized surface 121-128 
local area multiplier 139 
partial derivative 107, 206 
partial differential equations 

as integrability conditions 496-503 
partition of a curve see curve 
partition of a plane region 276 
patch (surface patch) see surface 
path integral 6-20 

converting to an ordinary integral 9 
of a scalar function 18 

of a vector function 8 
transfonned by a coordinate change 434 
pathwise connected 354,417 
permutation 65 
even, odd 65 
plane topology 276-278 
pleat 374-377 
Poincare lemma 503, 497-504 
for vector fields 508 
polar coordinates 20-21, 112,273-276 
in a differential form 429 
overlay 116-121 
polygonal approximation 14 
potential see scalar potential, vector 
potential 

primitive (of a differential equation) 493 
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principal axes (of a quadratic form) 240-243 
principal axes theorem 242, 260 
probability 20 

probability density function 311 
product rule 130 

for the exterior derivative 440 
program see computer program 
projection 50,205,206-212 
proof by induction 79, 255 
proper value, vector 35 
Protter, M. xii 
pullback 2-5, 10,159 
by a submersion 210-212 
on a differential form 431—437 
on a general £-form 442^143 
punctured 3-space 504 
punctured plane 495, 504 
pure dilation see uniform dilation 
push-forward 2-5, 160 
by an immersion 213-215 
in a change of variables 341 
“Pythagorean” formula 45, 55 

Q 

quadratic form 226, 243 

degenerate, nondegenerate 227, 245 
index 258 
level curves 240,242 
negative definite, positive definite 258 
principal axes 240-243 
quadratic map 116-121, 161-165, 340 
quadrature 492 

R 

Rg matrix that rotates the plane by 0 radians 
39 

R„ a remainder in Taylor’s fonnula for the nth 
degree polynomial centered at a 79 
random variable 20 
rank (of a matrix) 47 
rank-nullity theorem 47 
rational point 276 
ravioli 380 
refine (a grid) 279 

refinement (of one grid by another) 280 

reflection 30 

regular 

curve 189 
point 189,195,221 
relative flow 474-482 
relative-flow map, field 475 
remainder (in Taylor’s formula) 79 


restricted Riemann sum 307 
Riemann integral 296, 297-299, 301-310 
absolute (unoriented) 296 
equals Darboux integral 301-304 
oriented 356 

Riemann sum 272, 295, 297 
restricted 307-308 
Riemann-Darboux integral 304 
rotation 30 

rotation-dilation 40 
Rudin, W. xii 

S 

scalar potential 25, 508 

scalar triple product 43 

Schwartz, J. x, 343, 350 

sech (hyperbolic function) 177 

second derivative test 221,229,263 

seed (for an implicit function) 185 

semi-log 

graph paper 160 
map si 161 

sense of rotation see orientation 
set difference 286 
set function 310-312 

derivative of a set function 312 
of integral type 311-312 
sgn (signum function) 356 
shear 38,231 
nonlinear shear 190 
shear-collapse 40 
shear-dilation 40 
signed area 41,356 
similarity 118 

singular point, singularity ix, 263 
sinh (hyperbolic function) 22 
skew-symmetric see antisymmetric 
si semi-log map 161 
solving equations 151-158 
by finding fixed points 156-158 
space curve as a locus 199 
sphere 121-125 
hollow 409^112 

standard deviation (of a random variable) 20 
steady flow 388 
Steenrod, N. viii 
Stokes’ theorem 484, 482—492 
physical form 485 
strain 31,33,40 
strain-collapse 40 
stretch factor see multiplier, length 
submersion 209,208-212 
substitution see also change of variables 
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integral 1-5 
pullback see pullback 
push-forward see push-forward 
sup (supremum) 300 
surface 

as a locus (level set) 192-195, 198-205 
embedded surface patch 209 
oriented (oriented surface patch) 392, 
392^104 

parametrized surface 121-128 
local area multiplier 139,406 
piecewise-smooth 412 
orientable, oriented 420 
smooth 415 
orientable, oriented 417 
surface area 405^107, 407 

element of surface area 404, 408 
for a given parametrization 406 
surface integral 402 
alternate form 403^104 
compared with path integral 403 
of a scalar function 409, 409^1 1 2 

on a piecewise-smooth surface 413 
on a piecewise-smooth oriented surface 
420 

on a smooth oriented surface 417 
reformulated using a pullback 432 
transformed by a coordinate change 434 
surface of revolution 148 
surface patch see surface 
Sylvester’s law of inertia 262-263 
Sylvester, J. 262 

symbolic pairing see integral pairing 
symmetric matrix 60, 243,470 

T 

tangent plane 107 
to a locus 194 
tangent vector 7 
tanh (hyperbolic function) 22 
Taylor polynomial 

as “best-fitting” polynomial 88,95-96 
for a vector-valued function 97 
linear terms constitute derivative 99 
multiple variables 94-100 
definition (using Ax • V) 94 
one variable 77-90 
definition 77 

error estimates 77-79, 82-85 
two variables 90-94 
definition 90 

with the differential operator Ax • V 
Taylor’s formula 


for a vector-valued function 98 
remainder as 0(n + 1) 98 

multivariable case 94-100 
a bound on the remainder 96 
integral remainder 94 
Lagrange’s remainder 94 
remainder as 0(n +1) 95 

remainder as microscope equation 95 
one-variable case 79-82 
a bound on the remainder 85 
integral remainder 79 
Lagrange’s remainder 82 
remainder as 0(n +1) 87 

remainder as microscope equation 83 
significance of various derivatives 
223-224 

two-variable case 91-94 
integral remainder 91, 94 
with the differential operator Ax • V 93 
Taylor’s theorem see Taylor’s formula 
Taylor, A. vii 
Thom, R. ix, xi 
topology see plane topology 
toms 147 
total flux 389 
components 390 
for a given parametrization 395 
of the curl of a vector field 485 
through a single parallelogram 394 
trace (of a matrix) 36 
transpose (of a matrix) 42, 226 
conjugate transpose 267 
transposition (permutation) 65 
transverse intersection 200 
triple integral 309-310 
tubular neighborhood 289 

U 

uniform 

continuity 168 
dilation 30,40 
unit normal 

on a smooth surface 416 
unit tangent vector 17 
universal gravitation 269 
unoriented integral 296 
upper Darboux integral, sum 300 

V 

F, F inner, outer volume 309 


93 
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vanish 

at least to order p (“ 0(p )”) 
at an arbitrary point 90 
at the origin 87 
for a multivariable function 95 
for a vector-valued function 98 
fails to vanish to order p 90, 95, 98 
to order greater than p (“o(p)”) 
at an arbitrary point 89 
at the origin 85 
for a multivariable function 95 
for a vector-valued function 98 
to the same order 
at an arbitrary point 90 
at the origin 86 
vector field 7 
conservative 25 
flow field 388 
gradient 25 
gravitational 269 
path-independent 25 
vector potential 508 
vector-valued function 7 
volume 

as 3-dimensional Jordan content 308 
as given by a double integral 308-309 
vortex 462 

vortex flow field 473 
vortex lines 486 
vorticity 464 

induced by shearing 463^164 


quantified by circulation 464^165 
vorticity flow field 473 

W 

W ( C ) winding number of C 430 
Watson, A. xiii 
wedge product 

of differential forms 423 
of vectors 41,46 
Whitney, H. ix 
Widder, D. vii 
winding number 430 
window 112 

microscope see microscope window 
window coordinates 113 
window equation 115,172 
window map 162,172 
wine bottle example 229-237 
work 6-9 

coordinate components 7, 12 
infinitesimal 13 

Y 

Yap, S. L. 497 
Z 

z-score 28 


